In this and and future installments of Rick’s MetaTips I will be sharing advice on how to make your research—whether it’s your own, your institution’s, or your repository’s—more visible through SHARE by improving your metadata…
This month’s tip applies primarily to current or prospective providers of metadata to SHARE Notify, but probably includes some useful information for anyone hosting research content online. Because SHARE does not host any content itself, its data set is only as good as the links to content provided by its sources. Many links, even within the most stable organizations, can degrade over time: site URLs may change, underlying architectures can shift, and in the worst case content could disappear altogether. Therefore, persistent links are preferred for any linked resources in SHARE.
What makes persistent links different from other URLs or uniform resource identifiers (URIs)? The creator and maintainer of a persistent link commits to maintaining the link forever no matter what happens to the content the link references. Two common implementation schemes for persistent links are persistent URLs (PURLs) and digital object identifiers (DOIs).
PURLs
The first level of ensuring link durability is accomplished by generating persistent links at the host (organizational) level. Organizations or content hosts often do this through the use of PURLs based on a reference scheme independent of the content location. An example of this is a citation link for the “Reproducibility Project: Cancer Biology”1 within the Open Science Framework maintained by the Center for Open Science (COS):
In this case, the persistent link has no dependence on the content metadata, structure, or application architecture.
DOIs
Link durability can be taken to the next level by employing links that utilize a third-party organization, such as DataCite, or CrossRef, to generate DOIs that point to host-generated PURLs. Organizations can work with a DataCite member like EZID to generate DOIs. EZID is a hosted service with a fairly straightforward application programming interface (API), and also provides a user interface for manually tweaking DOIs.
An example from the SHARE data set that employs a DOI is a 2014 PLOS Biology article on “Ebola Cases and Health Systems Demand in Liberia.”2 The linked resource as it appears in the SHARE data set via figshare is:
http://dx.doi.org/10.1371/journal.pbio.1002056.g007
This DOI then resolves to PLOS’s URL for this article:
http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002056#pbio-1002056-g007
PURLs + DOIs
Some organizations choose to employ only a third-party DOI without a PURL, but combining a DOI and a PURL provides the dual durability of utilizing the same link regardless of whether the content is migrated from one host to another and regardless of whether the host undergoes any architectural changes in the future.
One caveat to note is a DOI’s durability still depends on the durability of the underlying PURL (or other persistent link) it references. So, it is still up to the host organization to ensure stability of the PURL as well as the DOI by updating the PURL according to any internal changes and subsequently updating DOIs as necessary.
This task can be aided by leveraging the PURL system as an index of all linked content shared. In turn, that index can be monitored programmatically to check the stability of links and flag any issues as changes will inevitability appear over time. There are several tools available that can help in this process, such as LinkChecker or the FitNesse testing framework, and this could also be done using something as simple as cURL, a common command-line tool, to check for errors on output.
One other powerful mechanism of the DOI contract that really provides the gold standard in durability is the expectation that the DOI maintainer will ensure that the DOI will always resolve to something, even if that target is a message stating the content has been removed and why.
If your organization is still new to persistent links, PURLs are a great place to start. Taking it a step further with third-party identifiers such as DOIs will put you in terrific shape to drive traffic to your content through SHARE now and long into in the future.
Endnotes
1 Timothy M. Errington, Fraser E. Tan, Joelle Lomax, Nicole Perfito, Elizabeth Iorns, William Gunn, Brian A Nosek, et al., “Reproducibility Project: Cancer Biology,” October 20, 2015, http://osf.io/e81xl.
2 John M. Drake, RajReni B. Kaul, Laura W. Alexander, Suzanne M. O’Regan, Andrew M. Kramer, J. Tomlin Pulliam,, et al., “Ebola Cases and Health System Demand in Liberia,” PLOS Biology 13, no. 1, doi:10.1371/journal.pbio.1002056.