2017 Summer Meeting: Data Stewardship Committee Business Meeting

Abstract/Agenda: 

Reference the 2017 strategic plan: http://wiki.esipfed.org/index.php/ESIP_Data_Stewardship_Strategic_Plan_Calendar_Year_2017

  • Matt: Takeaways and potential actions from session on Data Rescue
  • Peng: Use/service maturity matrix (MM-Serv) paper
  • Nancy H.: Data management training working group
  • Matt: Data and literature interlinking
Notes: 

 

  • Matt presented on status of data rescue document/current efforts
    • Current status: White paper submission still pending
    • None of the “doomsday” scenarios anticipated under current administration really came to fruition
    • Ideas for way forward/continued efforts:
      • Development of a “risk factor” matrix/analysis to evaluate levels of risk to different facilities and types of data
        • Rama says this has been done already for the NASA DAACs (sometime in 2012)
          • Risk Assessment Code (RAC) matrices are used to indicate what level of risk exists for data loss vs what the impact on users is. For each DAAC, the number of datasets that fall in the different cells of a 5 by 5 matrix are recorded. Risk of loss is lower when datasets are backed up robustly at remote sites, and user impacts are lower when post-loss recovery times are lower.
          • The RAC matrices are periodically updated.
          • But, changes in political activity wasn’t one of the types of risk evaluated
        • Matt will spearhead development of an ESIP risk matrix/analysis going forward
      • Identification of what datasets are associated with what risk factors
        • A lot of the data rescue groups would like a catalog or inventory of some sort that might help them direct their efforts
          • Non-trivial to assemble such a list, but would be worth it since data rescue groups bring a lot of energy and enthusiasm and could use some direction toward the most “at risk” data
          • Sophie: Important to distinguish between “soft” loss and “hard” loss, i.e., does the “at risk” data just become harder to access/hidden from obvious view, or was was it truly deleted/taken off the Internet/lost
      • Currently unresolved: Where to put “rescued” data?
        • Temporarily?
        • Permanently?
        • A directory/catalog could help with this
        • Question: Has the use of torrents been explored for storage of these datasets?
          • Referred questioner to the keynote talk by Matt Zumwalt
             
  • Ge Peng presented on the status of the maturity matrix
    • Service-level matrix is currently under development
      • Last phase in Science → Product → Stewardship → Service continuum)
    • Peng is looking for input on the Service maturity matrix
    • DS Committee members should get in touch with her
    • Question: What are “Service” activities?
      • Service = use by external community and user-level services
      • Examples: Data discovery, data access
    • Question: Is “Stewardship” appropriate as a name for the third phase?
       
  • Nancy H. presented on status of the DMT Working Group/DMT Clearinghouse
     
    • Officially became a working group about 2 months ago
    • Two primary projects:
      • First project: DMT Clearinghouse
        • Launched in October 2016
        • Currently being shopped around to various organizations and users
        • Beta testing of user interface
        • Seeking to better identify target audience(s), better identify what types of resources will end up in the clearinghouse catalog
          • Some trainers have expressed interested in MOOCs, syllabi, templates, and more video
        • Crowdsourcing alone doesn’t necessarily seem to be sufficient to gather a “critical mass” of resources for the clearinghouse
          • Will undertake an effort to seek out existing resources on the Internet and add them (with permission from content owners/creators)
        • Links/tie-ins to virtual educational environments?
          • Possible collaboration with IU
        • Some possible sources of funding/sustainment?
          • Science Gateways
          • Collaboration w/IU on IMLS proposal call (2 pages)
          • Both of these will require development of a strategic plan for the working group
          • Nancy is looking for DS Committee input
        • Need to think about governance
          • Advisory committee?
          • Belmont Forum
      • Second project: Working within/outside of ESIP to define core skills/competencies for data management professionals
        • Lots going on in this area already
        • Belmont Forum already active in this area
        • ESIP’s role in this area not entirely clear right now
        • Matt suggested (per Shelly Stall) more aggressive efforts to get mentions about/pointers to DMT Clearinghouse resources in AGU publications, press releases, etc.
        • Sophie encouraged everyone in the DS Committee to evangelize and advertise!
        • Ruth suggested collaboration with/reaching out to RD3 (RE3?) data folks
           
  • Matt presented on status of data and literature “interlinking” efforts
     
    • Number of existing efforts underway
      • SCHOLIX, http://www.scholix.org
        • A framework, not a technology
        • Sets some standards and best practices
        • Uses a “hub-spoke” model
          • Some existing possible “hubs”: CrossRef, DataCite, Open Aire DLI
    • Matt interested in ability of individual users and institutions to both contribute to the effort via generation of content and data, but also to pull information from the various hubs
    • Matt’s question to attendees: Who is using the related URL/related DOI field when submitting or adding data or literature to a given repository?
      • Various answers:
        • Rama: Using related URL field
        • Reid: JHU is making use of it, but only if it’s present when submitted by the content generator
          • “If we have it, we’ll use it”
             
  • Karen Hanson, JHU, presented on RMap: https://test.rmap-hub.org
     
    • Production instance should be available later this year
    • Goal: How to capture “full scope” of a scholarly work made up of multiple components (literature, data, code, etc., etc.)
    • Based on the concept of a DiSCO: A given aggregation of scholarly resources
    • RMap doesn’t mind which type of identifier or metadata you use, but there can’t be any gaps of in the network between aggregation elements
    • Users can submit anytime
    • Allows you to monitor/view changes to elements within the aggregation
    • Question: If there’s a node or identifier within the aggregation that becomes inactive/disappears, can RMap tell?
      • Answer: Not explicitly, no -- but easy to get RMap to download all of the nodes within the aggregation. You could then write some code to tell which items are missing/weren’t downloaded.
    • Future for RMap: Hoping to use it to fill in gaps in existing/new aggregations
    • All code on Github: https://github.com/rmap-project
    • Question: What is sustainment strategy?
      • Initial Sloan Foundation grant has expired
      • JHU has committed to funding it/hosting it for the time being
    • Karen reports that she had 7 people test the user interface this week during the summer meeting (with Sophie’s help)
       
  • Other business:
    • Sophie made a plug for an AGU/ESIP effort led by Shelly Stall to implement the FAIR principles with U.S.-based publishers and scholarly organizations
      • Contact Shelly Stall for info
    • Ruth reports that AGU will have a “data help desk” during the AGU Fall Meeting this year
    • Matt reports that the next monthly committee call will be in August
Citation:
Mayernik, M.; Hou, S.; 2017 Summer Meeting: Data Stewardship Committee Business Meeting; 2017 ESIP Summer Meeting. ESIP Commons , August 2017