ESIP Preservation and Stewardship Committee Telecon 2014-10-17

Abstract/Agenda: 

* Preparations for half-day workshop on citing dynamic data at ESIP winter meeting

* Picking a direction for the PCCS work - to generalize or restrict to certain categories of data

* How to move forward with NOAA’s Data Stewardship Maturity Matrix work

* Next round of comment on Citation guidelines for publishers, editors and reviewers

* Paper status (if time)

Notes: 

Ruth started discussing the dynamic workshop proposal for the winter ESIP meeting.  There had been a previous workshop held in England over the summer, and the information about this was sent around to the ESIP list.  Dynamic work is coming out of the Dynamic working group in RDA.  WIth these results from Europe it would be good to hold a similar workshop in the US to see if what is coming out of RDA will work here.  We are inviting Andres W. to hold a workshop on Thursday afternoon.  The last day of the ESIP winter meeting.  The way it works, is that we identify 304 data sets which will have an expert attending the meeting, and we work through a list of the processes for dynamic data.  And see what it would actually take for that repository with that data set to implement  And the problems and challenges they might encounter.

We need 3-4 data sets, tehre were several volunteers on the email list.  Question - are the people volunteering planning on being there and can we finalize that here today?

Rama asked for clarification on the type of data set.  Ruth suggested a modis set.  Rama said one from NASA, one from NOAA and a third from another group.  But was also concerned with the number of data sets we can accommodate.  Ruth said it depends on how many people will attend as we want small groups of 7 for each data set and they would be running in parallel.  If there was 30-40 people that would be different numbers.  Vicky said we would want a set that could change depending on how many people attend.

Ruth reviewed the list of suggestions so far. Anne suggested they could provide a set as well.

Suggestions so far:

  1. NCAR Research Data Archive

  2. IEDA

  3. BCO-DMO

  4. Polar Data Catalogue

  5. USGS National Water Information System (NWIS)

  6. NASA Last Satellite (space related - Anne)

  7. Set from Natalie (mentioned on call, see below for details).

Mark outlined that we need a person knowledgeable of the particular data set to attend the meeting.  Ruth clarified this a bit more.  Mark said - you only need to know the types of data sets used, what the users want and not the science.  Rama mentioned the emphasis is on the dynamic data set - how often changes occur and the types.

Anne mentioned that they have a server with dynamic things.  

Mark said this is similar to our guidelines, and citing subsets.  Coming in from different sources, and dynamic.  Putting a query id on a query you can reproduce.  going through the use cases, can we capture it and codify it in a way that can be reproduced.

Ruth said data that has other categories, like data coming in from multiple sites, or drill holes or something.  Which she thinks the BCO-DMO group mentioned.  Natalie said she might have some data that might work.  Computer simulation data, where they can use and share other peoples input files.  Sweeps, arms, visualizations, inport and outport.  For malaria but can probably look at other subsets.  Natalie will make an effort to be there, if not will send a postdoc in her place.  She explained the data a bit.

Rama asked how citable these should be, they need to be permanent and kept long time.

Natalie said they do have some which have been cited or shared, and they also have some that can be kept private.

Ruth said this clarifies when some decisions need to be made.  It was agreed that this was a good data set for the workshop.

Some asked about permanence… Mark repeated what Rama said highlighting that it needs to be citable.  Not about recreatable or not, but citability.  Rama said something should be there to recreate, Mark said this should be able to happen on the fly.

Ruth suggested Rama or Curt also suggest a NASA data set.

Rama suggested we come up with a list of things the attendees need to know in advance in order to be prepared.  MODIS - Ruth said she can bring a snow set, perhaps level 2 snow set as it has many granules.

Ruth said that a call will happen later with Andres to discuss the logistics once we have data sets decided.  We will also update the description on the website.

Curt said he can speak to MODIS data production.

Mark suggested another criteria - the person there should be able to have some authority to make changes to their system.

Ruth will talk to Kerstin, IEDA to ensure she has someone there, and Mark suggested the same for the BCO-DMO as well.

Ruth also mentioned, we should get a canvas out to members to get a forecast of some sort to figure out how many people may attend.  Rama asked about size of groups, why 7?  Rama suggested 4-5 so we can have more data sets.  Is interested in seeing a NOAA data set.

Ruth also mentioned that the USGS was interested in joining with their water data, which might be a candidate.  Which Mark pointed out would be similar to the one used in the UK workshop.  If so we will have 6.

PCCS - moving forward.

While Denise found the PCCS worked for rock samples in some ways, some did not.  Do we wish to move forward with a general document with specifics for specific communities, or do we start with community specific documents?  Rama said he has not done a lot since that meeting as he has been in transition.  Rama suggested a preference to go with the PCCS which has been built on existing NASA requirements, and has been working with the ISO standard groups for geographic systems but has not followed up on this question and is open to change.  Meeting a standard which is broadly applicable will make it more dilute.  And would like to do it on more focused groups.

Ruth said PCCS for remote sensing data (the current version) and Denise might want to develop one for sample data.  Mark said which types of information do we want to be able to interoperate?  And if everyone develops their own provenance standards, but don’t we want to share across data sets?  So some sort of common core might be useful.  Ruth mentioned from the summer meeting, a high level document and then more detailed for each area.  So high level content standard and a series of lower level ones which would tend to have people use the same terminology for the same types of information even if they have specialized it more.  Rama mentioned OAIS, and that might provide what we are looking for at the high level.  Ruth pointed out the earth sciences had specialized that a bit more based on the global change requirement document (which Denise said worked for physical samples).  Mark said we need a use case where we might use the two together.  Ruth had some examples (ground truthing for a map).  So Mark said a few use cases looking for uniqueness and commonality.

Ruth asked the newer folks on the call if they would be interested in working forward on this activity, and what kinds of data they have to contribute.  Peng said she would participate, but would depend on what needs to be done.  Does use satellite data, modeling data etc.

Mark is going to Australia and is attending a workshop on provenance.  He wanted to present the PCCS, and they are collecting use cases at this workshop.  This would be a great way to introduce the PCCS in to a community which wants to address some issues.  Related, an appeal - if Rama or Curt has any good slides, he would love to borrow them.

Mark said they have a tool for collecting use cases, and Ruth asked that he send it to the group.  Bob suggested other communities suggest areas where they think the current PCCS does not fit their current data, and some kind of change that might be needed.  If that could be identified, those differences could be compared to the PCCS and identify ways in which it could be improved.  Rama felt we had a ready made community - the dynamic data set groups are not all remote sensing, we could ask them to review this as well.  Ruth said that is true, and asked Anne and Natalie if they could do that.  Read through the PCCS spreadsheet, with the data set in mind and think would that work and what would be different.  Anne said she could do that.

AGU new data policy is out for review and comment.  AGU members should take a look at that and comment.

Actions: 

Ruth will sent the criteria to Karl, Kerstin, and the BCO-DMO and if so will add them to the list.

Between now and the winter meeting - data set owners (for the workshop) should look at the PCCS and see if it would work for their data and make notes about what would not.

Citation:
Duerr, R.; ESIP Preservation and Stewardship Committee Telecon 2014-10-17; Telecon Minutes. ESIP Commons , November 2014