Federal Agency Repository Review


This session will seek to understand successes and challenges facing Federal agencies in providing respositories for science data.

Some questions to explore:

- What's been done?
- How is it working? How is this measured? 
- What level of data curation is offered? 
- How many full-time staff are dedicated?
- What kinds of costs were involved to develop, implement, and operate a Federal data repository?
- What skill sets are required/sought to support the repository? 
- How are agencies addressing the issues around inter-agency data for their organization's repository? 


  • Viv:
    • Provided the background and the key questions that we would like to explore in this session.
  • Ken Casey from NOAA:
    • Background on the National Centers for Environmental Information.
      • Focused on oceanic, atmospheric, and geophysical data.
    • Mission is to be the “Steward of the Nation’s Environmental Information”.
      • Crucial to build and maintain as the trusted source for the focused data types.
    • Merger of NOAA data centers helps to provide consistent data management capability for all of NOAA.
    • The Data Stewardship Division works closely with the Center for Weather & Climate and the Center for Coasts, Oceans & Geophysics to allow the domain specific knowledge to be managed appropriately for the benefits of the user community.
    • The data managed come from sources that are national as well as international.
      • Assessment provided are also national, international, as well as annual.
    • The volume of data is increasing drastically especially due to the increase in satellite and model data.
    • It is becoming increasing expected to provide expert interpretation for the users.
      • The overall user profile consists of general/business/media/public (70%), researchers/business consultants (15%), and value-added providers (15%).
      • The users come from different economic sectors, regions, and societal challenges.
    • Six tiers of stewardship:
      • Maturity level also increases with the value of the tier.
        1. Long term preservation and basic access
        2. Enhanced access and basic quality assurance
        3. Scientific improvements
        4. Derived products
        5. Authoritative records
        6. National services and international leadership
    • Several datasets that have achieved the maturity level of 6 has been collected to be showcased as “golden examples”.
  • Heather Brown from NOAA:
    • Stewardship is a shared responsibility.
      • Stewardship division, support services, and science centers all work together to provide and support stewardship.
    • It is important to get to know each other and learn what each other does, so that the different locations can build synergies.
      • Different techniques being implemented:
        1. Tiger Teams
        2. Agile approach
        3. Implement small pieces
    • The Data Maturity Matrix (http://commons.esipfed.org/node/7956) is also being used.
    • Under NCEI, the former, separate NOAA data centers should all be able to support the data providers collectively.
  • Question:
    • Is there a fee associated with each of the six maturity levels that the dataset can achieved?
      • There are multiple ways for the stewardship cost to be recovered.  However, no, there is no directly cost associated and collected from the data providers.
    • Is there a correlation between how likely the dataset will be ingested and the priority of it ingest versus the dataset’s maturity level?
      • The dataset’s maturity can be improved once the dataset is ingested.  However, the more well prepared the dataset is prior to ingest, the most efficiently the dataset will be able to complete the process.
  • Ranjeet Devarakonda (for Suresh Vannan) from Oak Ridge National Lab (ORNL):
    •  There are 9 key responsibilities that the repositories perform: from acquisition to provide user working groups.
    • There is a wide variety of data types within ORNL, so being able to provide proper stewardship to all the data types is a key challenge.
      • Rapid increase in volume is another crucial issue.
    • Inconsistencies in data files/metadata/documentation/data citation are part of common issues as well as diverse user needs, constantly changing IT landscape and too many tools.
  • Questions:
    • How was the expertise needed to provide all the required functions within the data center developed?
      • The development took time because the understanding of the roles and the responsibilities evolved as the new challenges arose.  As a result, it took commitment from the personnel, management, and community to discover and identify the skills and knowledge set that would allow the user needs to be supported.
    • How are multiple DOIs for the same dataset managed?
      • Cross-reference of DOIs is very important.
    • Is the philosophy for the researchers to provide the metadata or the data curators/managers?
      • It should be a shared responsibility.
      • Adjusting/scaling the size and details of the metadata could be another approach.
Federal Agency Repository Review; Summer Meeting 2015. ESIP Commons , March 2015