Within our continued work to improve SHARE and its platform of tools and services, developers at the Center for Open Science are finalizing efforts to link intellectual works together in SHARE’s recent modeling upgrade. Extending the current capability to filter works by organizations, authors, awards, funding agencies, and publishers, we will now be able to link related works to one another. For example, we will be able to link articles to related data sets, related articles, related reviews, etc.
Instead of simply creating our own set of relationship types to describe these relationships, SHARE is using a standard lexicon of publishing reference types, the SPAR Citation Typing Ontology, to enable a rich set of relationship descriptions among a wide variety of intellectual work types. The SPAR Citation Ontology includes support for tracking related software or data with terms like “usesDataFrom” and “usesMethodIn.” In addition, the ontology includes support for describing relationships of related commentary and review documents—works about other works—such as “isReviewedBy,” “parodies,” and “isDisputedBy.” For tracking negative findings, there is also support for relationships like “plagiarizes,” and “isRetractedBy.” Many thanks to Karen Hanson from the Data Conservancy for pointing us to this ontology.
By selecting a metadata standard like SPAR’s Citation Typing Ontology (CiTO), there are a few trade-offs. First, in the advantages column, using a standard applies metadata values that are universal across any system. This is critical, as an important aspect of SHARE’s value hinges upon its interoperability and the portability of its data. Second, a common challenge in exchanging information between systems arises when plain text (i.e., sentences) is used to describe a particular attribute. For example, the relationship of linking an article to a data set could be defined a variety of different ways in plain text while reflecting the same meaning: “supplemental data set attached,” “used data set,” “is data for.” As humans, we can see these descriptions are relatively equivalent. However, it is very difficult for any kind of machine to automatically determine these statements are equivalent because different sets of individual words are used. Since the SPAR CiTO ontology facilitates using uniform resource identifiers (URIs) as values instead of plain text, it enables a machine to retrieve the meaning of that value by following that URI to a descriptive web page (i.e., the meaning is machine-addressable, -dereferencable). In addition, this makes it possible to automate the translation of metadata values from one scheme to another to prevent loss of information if/when records are exchanged across systems.
Conversely, in the drawbacks column, selecting a single standard can limit the flexibility of SHARE, as any one standard will never support all possible use cases. To maintain what flexibility we can, we are building the solution in a way that allows support for other metadata ontology standards to be added in the future. We needed to start somewhere, however, and SPAR’s Citation Typing Ontology represents most of our anticipated use cases.
Linking Our Records Together
With over 16 million research events now indexed by SHARE, there is a lot of data to sort through in order to retroactively apply relationships and link records together. We have a multi-faceted strategy planned that involves applying links through automated processing, curator review, and working with data providers to enhance records supplied to SHARE.
For automated processing, much information can be inferred from works that supply digital object identifiers (DOIs) for other, related records through the “dcterms:relation” or “dcterms:identifier” fields. Further information can be gleaned by subsequently harvesting metadata values stored within the DOI registry, like “resourceType.” In turn, SHARE could apply a relationship of “uses_data_from” to an article that has a “dc:relation” pointing to a data set, or “uses_method_in” could be applied to software that is present in the “dc:relation” field for a document. However, for any assertions SHARE makes there is a degree of confidence that can be applied. In our previous case, can we confidently say that a method was used from the related software? Probably, not with 100% certainty. Because of such cases, review by a human curator is vital to the precision and accuracy of this data.
Further investing in the strategy of combined expert and automated metadata enrichment described in “Rick’s MetaTips: Assessing and Improving SHARE’s Metadata Structure,” we are planning to enable curators to use a curation interface within SHARE to review and confirm/reject tentative metadata identified through automation. Curators will also be able to create new links between records, and add/update other metadata fields. These curators will include our curation associates as well as other local or authoritative representatives for data coming into SHARE. We are currently working through use cases and requirements for a release of the curation interface in 2017.
Working with Our Data Providers
Finally, metadata to link records together is ideally supplied when the records are first entered into the SHARE database. In addition to working directly with data providers to update this process, our curation associates at University of Notre Dame, University of Oklahoma, and University of Alabama are currently working on a metadata guide for SHARE data providers that will recommend a set of Dublin Core fields and values along with examples for asserting these kinds of relationships. We will be kicking off these efforts with our curation associates and data providers soon.