
At the July 2016 SHARE Community Meeting held in Charlottesville, Virginia, we announced more details about our new metadata schema and related utilities for SHARE, just released in early September. To recap my previous posts on improvements to the SHARE metadata schema, our priorities heading into the redesign of the SHARE schema were:
- Increase links among related materials, events, and researchers
- Harness partial and additional metadata retrieved from multiple data providers and link across those data providers (i.e., single metadata records in SHARE using data from multiple data providers)
- Enable better grouping and filtering of research events
- Improve the interoperability of SHARE’s metadata so that it is more consumable by other systems

SHARE 2.0 metadata schema on white board. Image courtesy of the Center for Open Science (click to enlarge).
These priorities have resulted in the following improvements:
- Master creative work records supporting multiple data providers
- Enhanced independent objects in the schema for research contributors, affiliated institutions, awards, funders, venues (e.g., event, publication)
- Better versioning and provenance tracking across data sources
- More support for linking elements across entities:
- People (ORCID IDs)
- Venues—journals, conference presentations, locations, etc. (geonames, uniform resource identifiers [URIs])
- Subjects (URIs, subject terms)
- Institutions (International Standard Name Identifiers [ISNIs], Ringgold IDs)
- Funders (Funder IDs)
- Awards (Award IDs)
- Re-architected ingesting and normalizing pipeline of harvested metadata
- Redesigned discovery interface for SHARE with enhanced filtering
Working towards this new SHARE release, the team at the Center for Open Science has completed:
- Implementation of new metadata structure and processing pipeline
- Ingest and processing of data from our existing 125+ data providers migrated to the new structure
- Linking of related entities, such as contributor to institution or contributor to work
Within our redesign plan, improvements in progress include:
- Mapping updated SHARE data elements to common metadata terms (a data dictionary being created by the SHARE curation associates)
- Updating the Push API (automated push of records to SHARE) and its documentation to exploit improvements to the schema
- Linking related creative works (such as items by the same author, or supplementary materials)
- Linking to other versions or derivatives of creative works
- Developing tools for curation experts (especially SHARE curation associates) to review and enrich the SHARE metadata
While metadata from existing SHARE data providers have all been updated to feed into the new schema, critical steps need to be taken by our data providers to realize our vision for SHARE:
- Use more persistent identifiers of objects and related objects (e.g., digital object identifiers, ORCID IDs, ISNIs, Funder IDs, Award IDs)
- Use common subject terms (e.g., controlled vocabularies)
- Replace text values with URI values (a.k.a. “strings to things”)
- Populate more metadata fields or identify other sources for missing metadata
Persistent identifiers and common subject terms (numbers 1 and 2 above) are especially important because, even if all desired metadata is not available from a single data provider, these elements enable harvesting information from other sources and cross-referencing research entities in SHARE. Our curation associates have begun investigating this process at their institutions, and we are currently working through strategies to bring more data providers on board in enhancing their metadata this way.
We have also heard from data providers that, even though flexibility is good for harvested data, a recommended standard for metadata mappings would be useful, especially via the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH). We are currently considering what community standards would be the best fit for SHARE.
For Further Reference
- Related presentations on the updated schema and data model for SHARE
- Notes from the July 2016 SHARE Community Meeting
- For those interested, a more technical entity-relationship diagram showing how links are asserted between people, things, and other things
- Additional technical documentation on SHARE models