In 2017 a questionnaire was distributed to Information Professionals (IPs) in libraries, archives and museums in order to explore the benefits and challenges that they experienced when using Linked Data (LD). The results of the survey, discussed here, indicated that one of the many challenges experienced by IPs was the task of LD interlinking. In response to this, we have developed an interlinking framework and accompanying tool specifically for the library domain.
What is Linked Data interlinking?
LD is classified according to a 5 Star[i] rating scheme and, in order to be considered 5 Star, a LD dataset must contain external interlinks to related data. LD interlinking specifically describes the task of determining whether a URI, used to identify an entity in one dataset, can be linked to a URI in another dataset as a way of representing that they both describe the same Thing or as a way of indicating that they are related in some capacity. Such links have the potential to transform the Web into a globally interlinked and searchable database rather than a disparate collection of documents, allowing for easier data querying and for the development of novel applications built on top of the Web.
Interlinking in the Library Domain
Upon reviewing the data on the Linked Open Data Cloud[ii] for some of the leading library LD projects, such as those of the Swedish[iii] (LIBRIS), French[iv] (BnF), Spanish[v] (BnE), British[vi] (BNB) and German[vii] (DNB) National Libraries, it was found that the majority of interlinks are to authority files and controlled vocabularies. Although these types of interlinks are extremely useful, there is a notable lack of interlinks created for purposes outside of authority control. For instance, interlinking could also be used to enrich data by linking to external resources which provide additional information and context for a particular entity. Examples of such interlinks in the projects mentioned above include links to encyclopaedic data-hubs such as MusicBrainz, DBpedia and Wikidata. However, further enrichment could be gained by linking to knowledge institutions such as other libraries, archives, museums.
Our research
The focus of our research was to develop an interlinking framework that would encourage the creation of different kinds of LD interlinks, and that was designed with the needs and work process of the library domain in mind. In order to remove some of the challenges experienced by librarians when working with LD, we also developed an accompanying graphical user-interface which was designed to be used by metadata/domain experts as opposed to technological LD experts.
What is NAISC-L?
NAISC-L stands for Novel Authoritative Interlinking for Semantic Web Cataloguing in Libraries. The word NAISC (pronounced noshk) is also the Irish word for links. The NAISC-L approach encompasses a LD interlinking framework, a provenance model and a graphical user-interface.
The NAISC-L interlinking framework is a cyclical, four-step method to creating an interlink (as outlined in Figure 1).
- Step 1 first requires the user to select entities, from an internal dataset, which they would like to create interlinks from. The user is then required to search for and select entities in external datasets which they would like to create interlinks to.
- Step 2 guides the user through the process of selecting a property/predicate that accurately describes the relationship between an internal and external entity, thus creating an interlink. This process first requires the user to determine the type of relationship between the two entities using a natural language term e.g. ‘is identical to’, ‘is similar to’, ‘is associated with’. Following this, the user is then presented with a list of properties/predicates which represent the selected relationship type. Using the provided property definitions and examples, the user is then guided to select the property most suitable for interlinking the entities.
- Step 3 involves the generation of provenance data, using the NAISC-L provenance model, that describes who, where, when, why and how an interlink was created.
- Step 4 involves the generation of the interlink and provenance RDF data.
The NAISC-L provenance model uses PROV-O[viii] as its foundation as it is the W3C recommended standard for describing provenance data and because it can be easily extended for domain specific purposes. We used PROV-O to describe who, where and when an interlink was created. We then extended PROV-O to include interlink specific sub-classes and properties. This extension, called NaiscProv is used to describe how and why interlinks were created.
The above framework and provenance model are accessible to the user via the NAISC-L graphical user-interface (GUI). The purpose of the GUI is to guide users through each of the steps outlined in the framework. An iterative user-centred design approach was followed in the creation of the GUI meaning that IPs were involved in a series of cyclical tool design and testing phases.
Future directions
NAISC-L is currently undergoing a final user-evaluation phase. We invite Information Professionals with an interest in Linked Data to complete this questionnaire which will give you the opportunity to complete a set of interlinking tasks on NAISC-L and to provide us with feedback on your experience.
We are also looking for Information Professionals who would like to trial NAISC-L to create interlinks from their organisation’s LD dataset. If you would be interested in trialling NAISC-L, please feel free to contact lucy.mckenna@adaptcentre.ie.
A more detailed version of this article has now been published in eLucidate, the OA journal of UKeiG.
More information on NAISC-L can be found @ https://www.scss.tcd.ie/~mckennl3/naisc/
Questionnaire link @ https://scsstcd.qualtrics.com/jfe/form/SV_cJ9VBQ2BuNbvbcF
______________________________________________________________________
Lucy McKenna is in the final year of her PhD in the ADAPT Centre, Trinity College Dublin. Funded by Science Foundation Ireland, ADAPT is a multi-institutional dynamic research centre focused on developing next generation digital technologies. Lucy's research is in the area of Linked Data for libraries, archives and museums. Lucy obtained a Masters in Library and Information Studies from University College Dublin in 2015.[i] https://5stardata.info/en/
[ii] https://lod-cloud.net
[iii] http://libris.kb.se
[iv] http://data.bnf.fr
[v] http://datos.bne.es/inicio.html
[vi] http://bnb.data.bl.uk
[vii] https://portal.dnb.de
[viii] https://www.w3.org/TR/prov-o/