Council for Data Facilities (CDF)

Notes: 

 
Notes for CDF part 1
 
COPDESS Coalition for Publishing Data in the Earth and Space Sciences
Directory built with the help of Center for Open Science and the AGU.
Working with rd3 data to make this a join effort.
 
There is a demo of the COPDESS site on the recording.

  • Initial registration needs to happen
  • As a new user, you need to create a repository
  • After this, you get setup with a repository, which can then be managed
  • There is a guide to a lot of this that has already been put together ([email protected])
  • Endorsed by: journals can/will endorse repository that they are using. This allows for a better connection between journals and data management groups 
  • All CDF members should please register their member organizations with COPDESS, and make the statement of commitment (sign it). 

The basics are this; a few priorities have been suggested for development, they are:

  • Semantics
  • Brokering
  • Shared infrastructure
  • Making use of existing capabilities
  • Influencing future funding solicitations 

 
There is an EarthCube report on the website that outlines much of this discussion on the EarthCube Architecture.
All hands meaning 2 weeks ago in Denver, CDF must get involved to make EarthCube a success.
Goal: Discussion about the key architecture aspects for the “Knowledge work pitch”.
How does a person connect into the EarthCube architecture?
 
How should the Council of Data facilities start interacting with EarthCube architecture?
Thoughts from the audience:

  • THEME: Don’t reinvent the wheel!
  • The data facilities are really the backbone of much of this interaction.
  • EarthCube seems to be the dominant group here. If EarthCube is “doing everything” how do the other facilities fit in?
  • Additionally, how do we allow facilities to leverage and enhance what EarthCube does?
  • From a data facilities perspective, I want to know: how many people came to my facility because of EarthCube?
  • How can we leverage existing technology? Why can registries work with DataOne? What about iRODs?
  • Keep it simple, however there are issues associated with funding (I don’t always have money to make everything inderdisciplinary!).
  • How is the administration’s policy on public access to research results coming in to play? All funded projects need a “data management plan.” There is an important distinction between making data available and making it useable. First phase, only publications and manuscripts are required. However, this is still being managed/developed/evolving as we speak.
  • CDF can help users in conjunction with COPDESS to assist in where/how to do these data storage things (things like a “matchmaker service”).
  • It’s great to work with existing infrastructure, but there are already many of these infrastructures. How do you deal with all of them? It’s not good to reinvent the wheel.
  • Data facilities have the power of an “operational reality check”. Implementing technologies take time/resources/energy. You rarely get what you want, you get what the developers know how to do. Try first to see if there’s an additional capability to latch on to before developing your own first. Funding the deltas (the extra money needed to achieve projects). 

Are there components of the architecture that you think are good candidates for early adoption?

  • We need to deal with the differences in semantics and vocabularies
  • Consolidation of the storage facility resources and registration
  • Develop a broker with the technology that we need
  • Requirements for maintaining provenance
  • Make them consistent with the soon-to-be announced awards from EarthCube, and better understand the award guidelines. Coop-atition happens, and it’s important to keep the funding structure in mind. 

Other thoughts/questions:

  • What is the role of CDF in the future? Meet at ESIP and discuss issues? Or are we going to act soon? There is likely a need to be more coordinated and work on things like shared infrastructure. It seems that we have a ways to go before actions, and we should fix that.
  • Potentially establish more structure and key working groups.
  • There are politics associated with funding that we need to consider. It’s hard to get success and the end game given these issues.
  • There is a need for more frequent dialogue 

 
IRIS runs infrastructure in several places.
Is the solution cloud computing?

  • This is hard to do on the Petabyte scale
  • Financially this is hard to do
  • Doing some testing in the cloud and make comparisons
  • Setting up workflows can be challenging and non-intuitive
  • Looking at other agencies to see if this work can be done collaboratively 

How does CDF work with in what’s already been done?

  • Maybe we should work on ONE thing that we can get every facility involved.
  • How do we set that single priority?
  • Build something for a small group and then grown it from there.
  • Where are the gaps and problems? Can we put forward a join proposal to stop this? 

This architecture provides a start.
 
The declaration that Matt and Bill referred to share with CDF:
http://datacommunity.icpsr.umich.edu/sustaining-domain-repositories-digi...
 
Establish a goal for CDF for the year. It would demonstrate the CDF value and build community.
Answer: What should that goal and thing be?
 
A badge that says “EarthCube Complaint” would be a great challenge and work.
 
 

Citation:
Council for Data Facilities (CDF) ; 2016 ESIP Summer Meeting. ESIP Commons , May 2016