Sixty individuals—from universities, corporations, nonprofit organizations, and government agencies—gathered in Washington, DC, Monday–Tuesday, June 22–23, 2015, for the second SHARE Community Meeting. Participants included SHARE Working Group members, SHARE Notify beta participants, technical partners, and other stakeholders. Most of the meeting was devoted to a small number of breakout task groups to explore specific issues—helping to define SHARE’s opportunities, scope out limitations and boundaries, and identify what successful execution will look like and who it will benefit. The invitation-only meeting was made possible with generous support from the Institute of Museum and Library Services (IMLS) and the Alfred P. Sloan Foundation.
Keynote on IRIS
On Monday afternoon, SHARE director, Tyler Walters, opened the meeting and introduced Julia Lane, professor of practice at the Center for Urban Science and Progress, a professor of public service at the New York University (NYU) Wagner Graduate School of Public Service, a provostial fellow for innovation analytics and senior fellow at NYU’s GovLab, and co-PI of the Institute for Research on Innovation and Science (IRIS) at the University of Michigan. Lane presented a compelling keynote talk about IRIS, a data platform designed to document and improve the value of public and private investments in discovery, innovation, and education. She summed up the challenge that IRIS is addressing by stating that universities spend $250 for every US citizen on research, but too little is known about what these investments produce. She likened this situation to a New Yorker cartoon that depicts two scientists at a blackboard on which is written a scientific equation, “Then a miracle occurs,” followed by another equation. IRIS aims to shed light on the “miracle” in the middle of the research process by documenting the economic, scientific, and educational outcomes of research. The Community Meeting included a breakout group that discussed how SHARE and IRIS might collaborate further (see “SHARE-IRIS Collaboration” under “Breakout Sessions” below).
Overview of SHARE Phases I and II
Following the keynote, Tyler Walters, Judy Ruttenberg of the Association of Research Libraries (ARL), and Jeff Spies of the Center for Open Science (COS) presented an update on SHARE and answered questions from meeting participants about next steps. Accomplishments since the first SHARE Community Meeting in October 2014 include the DuraSpace webinar series, launch of the www.share-research.org website and the SHARE Notify beta, and surpassing the milestone of one million research release events included in SHARE Notify. Currently in the works are partnerships with IRIS and ORCID and a pending grant proposal to fund Phase II of SHARE.
Walters, Ruttenberg, and Spies explained that Phase II will involve improvements in policy and workflow as well as technology to enhance the metadata in the SHARE Notify database and produce reports from the data. SHARE is working with Ohio State University (OSU); University of California, San Diego (UCSD); and Virginia Tech to better understand what administrative and other data are available and how SHARE Notify can integrate with their existing systems and workflows to add value. One application in Phase II should be a tangible way—e.g., an interactive website or dashboard—to demonstrate how OSU, UCSD, and Virginia Tech are using SHARE’s enhanced metadata to produce valuable reports. Another potential application of SHARE in Phase II is enabling analytics to help institutions compare themselves to one another, through an open data set that incorporates research activities of many types (articles, data, software, etc.).
Tuesday morning kicked off with an open discussion facilitated by Greg Tananbaum of how SHARE is being received at participants’ institutions, how they see SHARE serving their institutions, and what SHARE could do to help develop more support for the project on campus. In addition to discussion of the need for better metadata and more compelling use cases that Phase II will tackle, Tananbaum’s questions led to a productive conversation about the need to make it clear that bibliographic metadata can be freely shared without a license—it is factual data and therefore not subject to copyright. Many metadata providers are hesitant to register with SHARE Notify because they are unsure of their right to distribute their metadata, despite having an open harvesting protocol in place. Following the Community Meeting, SHARE changed the Notify registration process so that providers are simply required to check a box that acknowledges that metadata passed to SHARE will be part of an open data set, rather than the prior process of asking providers to affirm that they have the right to redistribute metadata.
Most of Monday afternoon was devoted to the work of four sub-groups: Manual Curation, Reports, Research Information Systems, and SHARE-IRIS Collaboration. First each of the four group leaders gave a five- to ten-minute talk about the charge of the group, followed by questions from all meeting participants. Then the groups split into breakout sessions to define the opportunities for their groups and scope out limitations and boundaries. The groups participated in breakout sessions again on Tuesday morning to discuss challenges that might prevent them from addressing the opportunities identified on Monday afternoon, how they can overcome those challenges, and what successful execution will look like. On Tuesday afternoon, each group presented a summary of their work to the full conference and answered questions and took suggestions.
The four groups are focused on the following topics:
- The Manual Curation Task Group will address the challenges of the SHARE Notify data set. The group will tackle such questions as how to clean up the data so that it is more useful, how to connect related objects, how to enhance sparse metadata, when and how to use technology or crowdsourcing to increase the data quality, and how to help the data providers improve the data at the source. The group will explore a way for providers to benchmark their metadata contributions against other providers, possibly giving providers a CSV file of records missing key data elements. Crowdsourcing could help under-resourced IRs, but it may require review before it is accepted. Automated comparison of SHARE data to other sources could help assess the trustworthiness of the data. People who might be interested in curating this data include open access advocates, librarians, domain experts, citizen scientists, and teachers using the metadata to address learning objectives.
- The Reports Task Group will produce a set of prototype reports that can be given to research officers to demonstrate the value of SHARE’s focus on openness and inclusion of all research outputs (not only publications) across disciplines. This group plans to develop reports that speak to four use cases: (1) emerging research areas, such as data science, digital humanities, and computational social sciences, where researchers are looking for leads to new research and potential contacts or collaborators; (2) the value of identifiers in revealing linkages, exemplifying researcher output, and benchmarking openness by institution; (3) incentives for content providers who can get enhanced metadata from SHARE after submission; (4) public use of SHARE, perhaps to research a health topic, which could showcase reuse.
- The Research Information Systems Task Group is looking at research output data from information management systems like VIVO and Symplectic Elements. These systems are sometimes based in libraries and sometimes affiliated with university administration. Potentially harvesting data from these systems is an opportunity for SHARE to obtain information about the research workflow early in its life cycle. The task group will begin with a survey of the metadata landscape in these systems and develop criteria for a few levels of quality that can be implemented incrementally. Challenges include open standards, privacy (HIPAA, FERPA), security, policy (openness of data), and interoperability. SHARE needs to be able to move data between systems and even across institutions, tagging items by sensitivity. There is an opportunity to assess the quality of metadata as it moves through the process, but the question of who has the right to view or change the data is tricky.
- The SHARE-IRIS Collaboration will explore how the two projects can enhance each other’s value. IRIS is developing rigorous, permanent data to describe the impact of research funding in US higher education. IRIS works with universities to extract data from multiple systems and departments about the grant funding their researchers receive and how that funding is used. IRIS wants to build a reliable trend indicator across disciplines and content types. The data will be as open as possible while protecting privacy and confidentiality. SHARE can help demonstrate the value of this vision to the academy, and help flesh out what this vision might entail at scale. SHARE and IRIS plan to start by working with a small number of campuses, and focusing on a few disciplines in order to dig narrowly but deeply.
SHARE Research Data Hackathon
Also on Tuesday, Erin Braswell and Fabian von Feilitzsch of the Center for Open Science (COS) presented an overview of the SHARE Research Data Hackathon that COS hosted on Saturday and Sunday, June 20–21, before the Community Meeting. More than 25 people participated in the hackathon, analyzing SHARE Notify data, improving the efficiency of the data pipeline, and planning for the future with prototypes of a curation interface. More information and demos of work completed are available via the “SHARE Hackathon and Barn Raising” page on the Open Science Framework.
Looking to the Future
Tyler Walters wrapped up the meeting Tuesday afternoon with a view toward SHARE’s future. First he reminded meeting participants that funders, universities, researchers, librarians, repository managers, the public all want to know what research is underway, where is it occurring, what outputs are being produced, and where to get the outputs. SHARE is creating an open data set about research activities to meet those needs—it is an open data set to facilitate innovation. The challenges SHARE needs to address are the inconsistent application of metadata and providers’ uncertainty of their right to share the metadata. Solutions will involve improving workflow, policy, and technical infrastructure at the community and international level.
Walters concluded by quoting Judy Ruttenberg, who recently wrote, “SHARE will be the kind of core infrastructure that helps mitigate the constant proliferation of boutique technical solutions and keeps urging people back to at-scale community solutions.” Walters said, “This is the takeaway: boutique one-offs impede openness and transparency, and raise costs over time.”
There was much enthusiasm in the room for a future meeting to include many emerging initiatives that were represented at the SHARE meeting—such as IRIS, CASRAI, CHORUS, and ORCID—to (a) establish common policies, procedures, and technologies for which these groups can collectively advocate and (b) develop a narrative that explains to stakeholders how these initiatives fit together. SHARE will continue to work with the research community toward finding at-scale collaborative solutions for making research outputs widely accessible, discoverable, and reusable.