Making the connection: How organizations are institutionalizing the use of persistent identifiers to link data, publications, people, and institutions

Abstract/Agenda: 

Persistent identifiers are powerful tools that allow us to unambiguously link objects and information. We now have a number of different systems of persistent identifiers for data, publications, people and institutions. The question is how are organizations taking advantage of these persistent identifiers to create meaningful links? How are institutions encouraging or enforcing the use of persistent identifiers? For example, how are organizations institutionalizing the use of ORCIDs or ResearchIDs among their staff and how are they ensuring that those identifiers are linked to an author’s publications and datasets? How are organizations ensuring that there are links between publications and datasets? What types of changes both technological and cultural need to happen in order to fully connect all of the parts and pieces of our science to tell a meaningful, accurate, and persistent story?

This session will feature a few successes and challenges from organizations that have made these connections and how they did it, as well as a discussion of how others can initiate the implementation of persistent identifiers.

 

Notes: 
  • Presentations associated with this session are attached.
  • Presentation #1: "ORNL DAAC experience with  Persistent Identifiers" by Suresh Vannan of ORNL DAAC 
    • Linking authors to their data is one of the key goals, so that the impact of the ORNL DAAC on science can be determined.
      • Citation metrics are used to help in quantifying the impact.
    • Citation can also assist with access of data by the users.
    • Key elements of citation for a data product includes 6 different fields (please see slides for further details).
    • Every year, ORNL DAAC generates statistics regarding citations and in text referrals of ORNL DAAC datasets.
      • If the citation is placed in the reference section of the publications/articles, it would make the generation of this statistics easier.
    • Other metrics format include:
      • Names of journals/publications that the citations are included.
      • Topics of datasets that are cited.
    • Citation can also serve as a "one-click-link" to access the associated dataset.
    • ORNL DAAC Data Product Citation Policy is available at the following link: https://daac.ornl.gov/citation_policy.html
    • Next step is to include ORCID to help the understanding of relationships between data and authors.
      • Key issue is to continue the adoption by the community for the ORCID to be effective.
        • The completeness of ORCID records can also help to improve ORCID's effectiveness.
    • Question: Is ORNL DAAC using any tools that could be used to for data citation metrics?
      • Answer: Not at the moment, but ORNL DAAC would be happy to look into it.
    • Comment: Identification of the item, the location of the item, and the context of the item are three potential different use cases/scenarios for the identifiers. It could be helpful to consider the different use cases in order to optimize the use of the identifiers.
  • Presentation #2: "Persistent Identifiers Implementation in EOSDIS" by H. K. “Rama” Ramapriyan of Science Systems and Applications, Inc. & ESDIS Project, NASA Goddard Space Flight Center
    • From the perspective of preservation.
    • Ideally, everything related to a dataset can be traceable, including: inputs, software/algorithm, authors, instruments, funders, quality, usage, and publications.
      • The assumption is that the authors of the dataset would not be around forever to answer all the questions relating to these topics.
    • However, there are many challenges related to the preservation of datasets, and these information are difficult to capture, document, and maintain for long term.
    • The implementation was original based on Duerr et al (2011) “On the utility of identification schemes for digital Earth science data: an assessment and recommendations” - DOI: 10.1007/s12145-011-0083-6
      • The implementation is a set of process involving eight different steps; four of which involve the DAACs, and the ESDIS DOI team is responsible for the others.
    • An online website is available to track the progress of the identifier assignments.
    • The Earth Science Data System Working Group is working on several topics relating to identifiers, and currently, the main focus is on software.
    • Key data citation resources are provided within the corresponding presentation slides.
    • Key challenges associated with identifier implementation are presented, including:
      • Implementation takes time, non-dataset objects need to be considered, and provenance also needs to be added.
  • Presentation #3: "Motivation and Strategies Implementing Digital Object Identifiers at EOL" by Janine Aquino and Don Stott of Earth Observing Laboratory (EOL), National Center for Atmospheric Research (NCAR)
    • Reproducibility is the main motivation.
    • Many related efforts led by funding agencies, journals, and community initiatives help propagate the practices of assigning identifiers.
    • NCAR has formed an Data Stewardship Engineering Team to undertake the identifier implementation process.
    • For EOL examples of datasets that are receiving identifiers include physical objects (field catalog, instruments, facilities) and datasets.
    • A Drupal module is being developed to facilitate the ease of working with identifiers.
    • Currently, identifying and citing software as well as workflows are two key focuses.
    • Several concerns have been raised by NCAR labs, but the team will continue to work with the labs to improve the process of assigning identifiers.
    • Finally, a "How-To" document is being created to help the community is better understanding and using identifiers.
  • Presentation #4 by Bob Arko of LEDO, Columbia University
    • The following are the motivations for using identifiers for LDEO:
      • Reproducibility (what journals want)
      • Reuse (what funders want)
      • Recognition (what researchers want)
    • The key implementation is for "instances", such as datasets, documents, and expeditions.
      • Some instances have regular IDs assigned to them, but the IDs are not persistent.  Additional feedback and suggestions from the community is welcomed.
    • Ultimately, an example of success case would be for an expedition to be link to all related items, including papers, funds, datasets, etc.
    • Additional work such as updating to DataCite 4.0 and improving identifier practices are planned.
  • Presentation #5: "Getting to the PID – pitfalls along the way" by Denise Hills of Geological Survey of Alabama
    • Key goal is to improve data access.
    • The main item that needs identifying is physical objects.
    • Assignment of identifiers is increasing, so progress is being made.
    • "Metadata archaeology" or finding all the information relating to an object in order to document the objects completely is needed before the objects can be registered for identifiers.
      • This has been the key challenge for achieving 100% implementation of identifiers.
  • Presentation #6: "Expanding the Use of Digital Object Identifiers for Interdisciplinary Scientific Data" by Bob Downs of NASA Socioeconomic Data and Applications Center (SEDAC)
    Center for International Earth Science Information Network (CIESIN), The Earth Institute, Columbia University
    • Recommended citation format and data landing page are available.
    • In addition to assigning datasets, data documentations are also starting to receive identifiers.  This allows bidirectional linkage between the data and the documentations.  These links are also available on the data landing page.
      • Ultimately, data as well as the related documentations and publications can all be linked together.
        • A challenge is being able to link to the journals/publications consistently.
        • Relationship between the data, documentations, and publications is implemented through the use of DataCite relationship types.
    • Current work on the identifiers includes defining alternative metrics for usage through identifiers and options for assigning identifiers to software and services.
  • Question: How has ORCID been helpful since many ORCID records are not complete?
    • Answer: It provides a way to start identifying the plausible authors, but additional improvement is certainly welcomed.
      • Follow up question: Historical data could be especially challenging as the authors for these data would not have had a chance to register in ORCID.  Is there a solution/suggestion for this scenario?
        • Answer: This is an ongoing challenging; a similar problem is duplicate ORCID for the same person.
  • Question (from remote attendee): I would like to hear more about how folks are assigning DOIs to publications derived from data, and how those links are discovered or communicated between data and publications providers
    • Answer: If an article is published and indicates some data that are available, these data can then be linked to the published article.  However, if data are available but the associated publications have not been pointed out, this linkage could be difficult to establish.
  • Question: Are there any additional examples of using persistent IDs for items that are not covered in the presentations?
    • Answer: Yes - data properties/measurement units, provenance trace.
  • Comment: Identifier is not the same as DOI (Digital Object Identifiers).  DOI is a type of identifier, and there are many other formats of identifier.
  • Comment: There are additional opportunities to continue the discussion, including AGU Fall Meeting 2016 and CODATA Data Science Journal Special Issue (submissions are due August 31st).
Citation:
Langseth, M.; Making the connection: How organizations are institutionalizing the use of persistent identifiers to link data, publications, people, and institutions; 2016 ESIP Summer Meeting. ESIP Commons , March 2016