Skip to main content

SHARE

  • About SHARE
    • Our Community
  • Projects and Partners
  • News
  • Contact

SHARE News

Resources | 28 August 2017

Recommendations for Curating Digital Commons Metadata for SHARE

SHARE curation associates, January 2017, photo by Amy Eshgh

It has been almost a month since the news broke that international for-profit publisher Elsevier purchased open-access-publishing software company bepress. By now the immediate shock is waning and institutions are considering the implications of what that takeover means for their open access publishing support. No matter what decisions are made, interoperable and easily shared metadata describing scholarly works is essential for any platform to play a role in the scholarly communication landscape. Good and consistent metadata is crucial for sharing or migrating data.

At the beginning of our appointment, we 2016–2017 SHARE digital curation associates were asked to assist the SHARE development team in “gathering, cleaning, linking, and enhancing metadata.” Our first task was to look at the quality of our own metadata. Bonding over food and drink at a SHARE meeting, a few of us who currently use Digital Commons, a bepress product for managing and publishing scholarly works online, discussed the possibility of collaborating on our metadata curation efforts. For our initial exploration, we bepress institutional repository mavens began by looking at our own harvested data, and quickly discovered that we had similar problem areas and deficiencies.

various metadata prefixes

 

 

 

 

 

 

 

 

 

First, simple Dublin Core (oai_dc) is the default prefix for the bepress OAI-PMH endpoint.  The chief issue with harvesting simple Dublin Core metadata is that it lacks nuance, granularity, and sometimes even important pieces of data. For example, the digital object identifier (DOI) is not exposed in oai_dc. When this unique identifier is unavailable to the SHARE harvester, it may be difficult to detect possible duplicate records. Moreover, SHARE and other harvesters cannot take advantage of the ability to simply link information using uniform resource identifiers like DOIs,  connecting an author to their institution, or a grant to the final research output. (See Rick Johnson’s post, “SHARE Metadata Is Stitching Together the Research Life Cycle.”)

In addition, we also found that the Dublin Core field labelled “type” needs context because it is used both for a Dublin Core Metadata Initiative (DCMI) Type Vocabulary and for something similar to genre. The situation is a bit muddled for bepress customers because oai_dc uses ”text” as the default for everything, so for example, image files are automatically assigned the type “text” until someone changes the type to “image.” We noted, too, that Digital Commons has a required “document_type” field that could be mapped to “dc:type,” and this same field is used in journals for sections in the table of contents. Variant uses mean that the facets cannot be limited to a controlled set of terms.

Similarly, context is needed for metadata fields that support the OpenURL format, such as volume number, issue number, and first and last pages, which appear as orphaned digits when isolated from the record, and thus they can be difficult to understand in a disparate setting.

Also in Digital Commons oai_dc, the “publisher” field defaults to the name of the repository. It is not apparent how one might include both an institution name and a repository name (or if this is desirable) in the metadata.

A final example is the flat “author” field of the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH), which does not expose affiliations, identifiers, or roles. This data would be invaluable in helping SHARE to disambiguate author names and connect related events in the research cycle, such as matching a data set to a research paper and its authors.

As we crunched through different examples and varying formats we drew some initial ideas to share with other repository minders, our “Recommendations for DC Repository Managers,” first presented at ACRL 2017 in a poster session:

recommendations for Digital Commons repository managers

 

 

 

 

 

 

 

 

We are currently in varying stages of progress in curating our own data. For example, at Western University, we have begun to polish our electronic theses and dissertations (ETD) metadata by adding language fields, supervisor name fields, and ensuring that dates, such as an embargo with correct encoding, are included and exposed to harvesting.

By the way, we certainly don’t want to give the impression to bepress users or any potential metadata provider that having your metadata harvested by SHARE is a difficult task. Nothing is further from the truth. The SHARE team provides a simple form to fill out and one only needs to provide information about the OAI feed for simple Dublin Core. Or one can send a note to share-support@osf.io and a team member will contact you to go through the necessary steps to start harvesting from SHARE. SHARE ingests and normalizes the data, transforming it into the SHARE schema. However if we, the repository managers, are able to provide well-defined, standardized data, it minimizes the efforts of mapping and matching and is beneficial for any migration, sharing, or re-use.

SHARE registration form

 

 

 

 

 

 

 

 

 

In our second phase of exploration we looked for a standardized way to express our data. We cast a wider net and mapped the various bepress flavors of OAI to the evolving SHARE schema and investigated a variety of other well-defined outputs, including DataCite. We looked at documentation for other repositories and reviewed the wider repository environment to ensure our ideas went beyond our own repositories.

SHARE-DataCite-Dublin Core-bepress metadata mapping spreadsheet

 

 

 

 

 

 

 

 

As the project was nearing its finish, and our document was almost complete, we discovered that SHARE had posted its own general recommendations informed in part by discussions and meetings with all the curation associates:

  • Every OAI source supports oai_dc, but they usually also support at least one other format that has richer, more structured data, like oai_datacite or mods.
  • Choose the format that seems to have the most useful data for SHARE, especially if a transformer for that format already exists.
  • Choose oai_dc only as a last resort.

The result of our examination, beyond the general recommendations, features guidelines with detailed suggestions for specific fields. For example, we note that the “date” field is problematic because there are so many possibilities (created, published, issued, modified, etc.) and no means of distinguishing among the various date traits. Another example we wish to improve is the handling of “format.” This is a derived value in bepress metadata and there is no way to indicate controlled terms. And while qualified Dublin Core will expose a DOI, other unique identifiers remain hidden. Please see the full report of our findings, “Best Practices for Mapping Digital Commons Metadata for Harvesting by SHARE.”

We believe that employing these recommendations will improve bepress metadata for sharing and for harvesting. We hope that other metadata specialists, including the broader Digital Commons community, will give feedback on our recommendations, as we feel that if a majority can agree on the same practices, it will be easier for bepress to implement our suggestions. Please review our recommendations for alignment between Digital Commons metadata schema and SHARE. We welcome comments or feedback on the document, or you may contact any of us directly with your thoughts and ideas!

Thank you!

By Lisa Palmer, University of Massachusetts Medical School; Joanne Paterson, Western University; Wendy Robertson, University of Iowa; Emily Stenberg, Washington University in St. Louis

lisa.palmer@umassmed.edu, jpater22@uwo.ca, wendy-robertson@uiowa.edu, emily.stenberg@wustl.edu
Tags Repositories, metadata, Curation Associates
  • Related Posts

    • March 1, 2018SHARE v3: Rebooting the Metadata-Harvesting Framework, Metadata-Editing Pipeline

      Jeffrey Spies, SHARE’s co-director and the original architect of both SHARE and the Open Science Framework (OSF), will be ... read more.

    • January 26, 2018Technical Update: January 2018

      The SHARE developers have enhanced SHARE over the past few months, by back-harvesting a variety of metadata providers, and ... read more.

  • Topics

    • Uncategorized (2)
    • Events (37)
    • SHARE News Releases (22)
    • Partners (23)
    • Career Opportunities (5)
    • SHARE Updates (41)
    • What people are saying (16)
    • Presentations (23)
    • Resources (19)
    • Rick’s MetaTips (8)
    • General (11)
  • @SHARE_research

    Tweets by @SHARE_research
  • About SHARE
  • News
  • Contact
Sign up for updates
@SHARE_research

All content is © copyright SHARE and available under a CC-BY 4.0 license.

Association of Research Libraries
21 Dupont Circle NW #800
Washington, DC 20036
202-296-2296
info@www.share-research.org
  • Credits
  • Accessibility
  • Privacy Policy
  • Brand Guidelines
  • Dashboard
This site uses cookies. By clicking 'I understand', you are agreeing to our use of cookies. More Info...
I Understand
Privacy & Cookies Policy

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Non-necessary
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.
SAVE & ACCEPT