Data Stewardship Planning


This breakout will summarize the current activities of the Data Stewardship Committee (Data Stewardship Principles, Indentifiers, DOI Landing Pages, Data Citations, Provenance and Context Content Standard and Preservation Ontology) and discuss the future roadmap.

The action items created through this meeting have been attached as a PDF to this post.

Notes will be taken in the following google doc during the session:



Data Stewardship Planning

Meeting notes:

Curt started the meeting asking if we should change the way we meet and organize - what things go on the wiki and what goes on the commons; google documents etc.

Curt also would like to talk about what we have been doing and also emphasise the value of the deliverables we have been creating.  Through sharing and outreach - these papers have been tremendously useful and have a strong life in the community.

Do we want to start a peer reviewed journal articles or other documents to share with the community that have the leverage of the ESIP branding?

What milestones and deliverables would we like to form for the future?

Brent was introduced from the Information quality cluster.  There is overlap between these groups and he would like to work with us to create use cases and how information quality is assessed.

Ruth had a question - she was recently at an NSF workshop on data quality.  A report came out of that as a dozen white papers defining data quality in different aspects.  She thinks that having an ESIP prospective is good but we should draw on some of this existing work.

Brent - how do we contribute to the area without reproducing this effort?  creating a framework that can be used for decision makers - taking multiple different frameworks in to one way of looking at information quality.  A good data record for one person is a “bad” set to someone else.  So how do you make things documented enough to define these differences.  Brent thought that harmonization between these groups has not yet been done.

Brent believes ESIP can carve out a small set of this and come up with how information quality should be constructed or defined.  Pick some use cases and build a framework - but would like this to be a grassroots effort and build upon that instead of a closed viewpoint.

Curt started a topic on Data management training and turned it over to Ruth and Nancy

Ruth - ESIP got some money from NOAA to do work within ESIP to create training.  They created training materials for the science community.  There are about 50 small 5-10 minute modules that are single topics - “what do I put in a data management plan”; “What do I need to think about for …” they are little powerpoint presentations and associated videos. Very short that can be picked up when someone has a question.   They have been peer reviewed and are moved to the commons as they are completed.

We have all of these, and they are starting to be used in different venues.  Some of these will have to be maintained -what agencies require”?  This is a changing target.  Also they had a longer list of what they wanted to develop - they could use more authors if anyone is interested.

Cury talked about the DOI paper that Mark created.  And also the discussion since then to talk about what do we do to publicize these things and what other steps can we take as this body to further citations.  We talked about the PCCS.  NASA has moved forward with their version of the PCCS.  What are our next steps on this?

PReservation Ontology - we have tweaked it a bit, part of the WC3 prov standards.  Hook Hua is working on a working group for Prov Earth science from the NASA perspective - is there input from this group that we can give?

What are the deliverables?  Who will be lead?  What will we get out of these things?  What are our aims?  Letters, support of the ESIP assembly.

Denise - would like to add another activity.  We had a session yesterday on physical materials and incorporating that into the digital domain.  We talked about where we would like to be by the summer meeting.  We would like to develop a collection of existing best practices and start to identify the users and stakeholders of these materials.  It is a very early step.  We would like to start out under the umbrella of stewardship.  Analog and physical objects.  We would like to identify what already exists and then meet at the summer meeting - how we can utilize what is already there or modify to fit ESIP’s needs.  Or recommend an already established best practice.  Perhaps even the creation of white papers.  So we need to do a needs assessment first.

Rama - PCCS - from the point of view of quality.  Also NSA produces their versions of this as a specification - are other agencies doing something similar?  Also do they want to move forward with this as a ISO standard?

Brent - is anyone saying that if you don't follow this ISO standard it is just numbers and not data?   The idea that data is not reproducible  it is not data?

Curt - can we turn this into an article in an opinion venue?  That might be more acceptable as a formal publication that can address some of these issues?

Rama - like the citations document a readable document as a report and less of a matrix for ESIP?

Curt - principles should be up in the clouds and this gets more concrete than that.

Mark - a more popular essay style document

Curt - you need both.

Rama - myself and John and Ruth have a publication like that.  

Curt - we should take things like that and claim them as ESIP outcomes as well.

Brent - Eos letter - I like this as an idea, something short that can be shared with editors.  Maybe it will actually get to the people using it.

Rama - Most of the agencies feel constrained against putting your name on a document that is strongly phrased.

Ruth - the paper they publish before has helped move the field forward and maybe something similar to that would help.  Curt assigned this to Mark and Ruth as well.  

Mark asked about the focus - Ruth said that it is on the PCCS.  Ruth suggested Brent be part of this discussion/paper as well.

Eos - AGU publication.

Mark - our citations guidelines are getting good mileage but it is not the only word on this topic.  In particular the digital curation standards that deal with some of these things - like versioning.  ESIP is better than most but still not best.  Mark has recently been asked to take part in a CODATA group on citations.  But they are a survey of the whole field and not guidelines.  This might be an opportunity to take it up to a higher level of recognition - internationally.

Curt asked what next steps should be?  Next steps before were guidelines for reviewers and editors.

Brent is interested in this as well.  He spoke to associated editors for AGU and said that he encountered the problem where the data citation does not also cite to the paper.  And authors are a bit afraid of this process.  So Mark said you have two data citations?  One to the paper and one to the data?  Mark said - why not both?

Curt - data publication is still controversial.  Is that worth guidelines or a letter or more attention to this?

Mark - Peter and I have a paper on that in press.  Read it before going to next steps.  He is trying to do some Eos article or something like that - generally what is the data citation issue and now how do you do it? Willing to work with someone on that if anyone is interested.

Denise suggested Anne Wilson as lead on a project such as is it time for a Data Decadal survey - she recapped some of the outcomes - to define what this encompasses, is it digital or physical as well.  There is a need to scope out what has already been done.  We are looking at a 6 month time frame now because it is in the inception phase - we would like to have a well scoped synthesis of the studies that are out there, and what our study would do.  Anne will write up more about this.  The idea was to come up with an initial scope of the project to present at the summer meeting to create a white paper to shop around to agencies at the winter meeting.  

Curt - preservation ontology - still trying to figure out what the deliverables might be.  Helen is not here currently.  But Curt would like to see a domain specific extension to the WC3 prov standard.  The WC3 has a very specific way for how to demonstrate the provenance for a data set to a paper.  Prov has a general way of doing that.  We want to standardize the way these things are named and the relationships.  The PCCS is the first step and from there you do a modeling that results in an ontology.  

Rama asked a question about NASA’s work - Curt - we need to figure out how these needs are different and to create a technical note to share with WC3 that would be complementary.  

Curt - If the target was a standard we would go to ISO.  We will point to that if necessary in the future.  If it is the same people in both groups from here and NASA then we need to make sure to talk across these groups.

Curt - consider the PCCS for the physical group - what might need to be the provenance context for physical objects?

Curt - Identifers

Mark - USWG (?) wants to do something with that.

Curt asked if it is worth exploring a white paper out of this group.  Mark said it might be difficult to give authorship (?)  Curt - roles are different between different groups.  Principal investigator and how to define the roles.  Stuff we can feed into the community and and get opinions on the ESIP way of giving credit to people.  Attribution.  Produce a report specifically on that.

Mark - data lifecycle from the functional and that is different than how we represent authorship.

Cyndy - it is fundamental.   It is very personal.  And can be one of the ontologies developed by ESIP.

Curt - we touch on that in citations.  In order to fit in the traditional citation, but on a broader landing page …

Mark - we started with the landing pages, and one of the more complicated ones is who all to give credit to?  But other things need to be on the landing page as well.  Along with having a machine readable landing page too.

Cyndy asked about a formal mechanism for some of these various groups.

Rama -these groups are short term and focused.  Gave an example of a group that is focused on DOIs and landing pages.

Cyndy - it feels like this is a next step of how this applies in a larger context which helps inform your work.  And improves the system you created.

Ben - It is pretty tractable  - taking the next steps.  When working on the ISO standard several years ago, it almost sunk the ship - so we need to focus deliverables into attainable chunks.

Curt - should we talk about milestones for these deliverables?  A lot of these discussions can continue on the mailing list as well.  Curt does not want to lose that moment in getting work done on these projects and we have done well with the mailing list in the past.

Mark asked about the use case activity?

Curt - the question of tying things like PCCS artifacts why you want to keep them.  And the ontology is the formal structure.  And identifiers shows what you call the things you keep.

Mark - it sounds like we did this backwards?

Curt - we did a bit of crosswalking across these products.  And we need more use cases.  Tie them into the motivations behind some of these things.  It is worth that activity.  It is valuable and helps us to focus our minds.  What are the use cases that drive the motivation of this as a domain?  Mark suggested Bruce B. As someone to discuss these points.  

Mark - maybe we need to do more use cases to show what drove the work.

Sarah mentioned the value of these use cases in getting other people to apply these guidelines.

Cyndy is interested in this concept as well.  She thinks use cases are very important.  Curt thought it might be something to put on one of the telecons and maybe we can organize the wiki on how we want to present these things. Curt will discuss this with Sarah.  While it is valuable we might not be able to sustain it as well.  Instead of deep cases perhaps lots of shallow cases.

Curt suggested perhaps having separate telecons that can focus on these issues.  Cyndy suggested maybe a workshop this summer.  Sarah suggested creating a workshop particularly to capture some of these use cases during the workshop.  Given guidelines people will give us data and that will be put up online as a use case at the end of the workshop.

Rama spoke about data connecting to other things.  We talk about interoperability and how it is important within different data sets, but some use cases discussing some of these ideas would be useful as well.

Curt -we have a full plate of activities from this group, and it is good that we have a diverse set.

Nate asked about DOI landing pages again - how will these various groups work together?  

Curt - Who will make the white paper?  Maybe instead ESIP will just bless these things instead of doing the creation.  Jeff is working on something for NOAA - maybe ESIP makes it harmonizing across these different groups?

Curt -there are a lot of issues behind identifiers that go beyond DOIs.  Data identifiers are one step, but DOIs for people and other things are important as well.  Other artifacts in the PCCS etc.  Gave an example of earth observational data that is in a spreadsheet without proper DOIs for these things and the way identifiers are used are not standardized.  And perhaps this group can take that and recommend common identifiers for those things and how to develop relationships between them.

Cyndy - once the roles are figured out it gets complicated and who is the authority for some of these things.

Ruth - we said we would try and work on a paper on that.  But the problem was the original identifiers paper went to a particular level of assessment that you would have to do again.  And that requires a significant number of people to take part.

Cyndy said that this issue has come up at a number of other meetings.

Mark - this is something that is not specific to Earth Science and it something that can get battled out by other larger groups.

Cyndy - would like to see more discussion however on these points.

Curt - a call of what we should be using.  

Others voiced that this might be too soon.  But Curt said it is a current issue that needs to be addressed.  Mark said it is a critical mass thing - need people to be using and pushing these things.

Continued discussion on identifiers - Curt made notes of individuals who would work on creating a document on which instances we care about.

Mark - mentioned a lot of other groups are looking at this like RDA who has an identifiers working group.

Curt thanked everyone for sticking around til the end and would add everyone who signed up to the mailing list.

Curt also asked about administrative process?  Should we move the telecon?  We are going to keep the current time - we will not have a meeting this January.  But it is the second Monday of the month at 2pm EST.  Please keep looking at the meeting notes as well.  And feel free to send out messages to the mailing list.

Sarah and Curt will make wiki pages for each deliverable from today's meeting.  We can hold separate telecons or use part of these standing meetings to deal with some of these issues.

Mark asked about applying for money?

Curt - the time to develop ideas is now!  So that we have ideas ready for when the call comes out in the fall.  So now is the time to bounce ideas for funding for ideas related to ESIP and the committee.

Mark - this is something to keep in mind 5 or 10k to get a project going or on track.  

Curt - also we can get travel money for some experts to come to the summer meeting.  And we can request space on the ESIP testbed.  For prototyping ideas.  And we funded experimenting on DOIs in the past.

PDF icon DataStewardshipActivityPlanning.pdf59.59 KB
Tilmes, C.; Data Stewardship Planning; Winter Meeting 2013. ESIP Commons , November 2012