Preservation and Stewardship Committee Telecon 2014-09-12

Abstract/Agenda: 

* Data Stewardship Committee budget request and plans

* Sessions for the winter meeting (due 10/10)

the possibility of holding a “citing dynamic data” workshop at the winter ESIP meeting to compare US results to what the folks in the UK found

*News from RDA

Linking to Research Data MANTRA course?

Bibliometrics working group survey

Notes: 

Ruth and Denise discussed the meeting time - perhaps now is a good time to send out another poll and reschedule the meetings with the semester starting etc.  Sarah will send out a doodle poll after creating a short list of times.

Budget update

We need to turn in the budget by the end of the month.  One of the action items is funds for a student once Sarah leaves.  Also re-upping our request for travel support. We are occasionally asked to attend things to represent ESIP’s work.  Ruth provided a few examples of where members of this community have been asked to attend a meeting to represent ESIP and have not been provided funding.  Ruth said that the travel support request was for $2000, was not sure if this was enough.  Bob said $3000 for two people would be better.  Denise agreed, depending on where you go.

Had thought about including support for data carpentry - but that is not clear at this time, and we might have to forgo it this year.  Denise agreed that holding off is a good idea.

Publication charges - we have three publications in works, and we should put in funds for those.  Bob said that the publications (when open access) can cost upwards of $3000.  Mark said EOS only costs about $50.  Matt said DLib was free.  So it was really more of an issue if we move along other route.  But we need at least some money to cover the EOS fees.  Bob said if we have other papers going to other places, we might need more money.  But they are expensive, and unless pooled this might not be enough.  Ruth said we have been a bit slow with our publications, and we do not have one planned that costs so much, so we might not have to worry about that until next years funding cycle.

Other budget requests? None were mentioned.  Ruth will work with Erin about getting this complete by the end of the month.  They are currently working on the student fellow position.  But not sure how quickly they turn around budget requests.

Expanding data stewardship guidelines

We had talked about this at the summer meeting.  Started with a set for authors, repositories etc.  But we had said we would go around and do guidelines for all the various groups in earth science.  Ruth created (based on a modified version of AGU’s) a version for editors and reviewers.  Last month she asked people to review them.  She thought it would be helpful to walk through them today.  http://wiki.esipfed.org/index.php/Interagency_Data_Stewardship/Citations...

Ruth asked if this is something we want to move forward on.  Matt said he was just looking at it now, but it is something he is working on for another organization AMS and is interested in working on ESIPs efforts. He noticed some similarities and some differences (like on what data means - not giving an expansive definition).  Also the time period - this might be arbitrary or different for different communities.  Ruth stressed at least 5 years, and that 1 year might not be enough.

Mark asked for clarifications as to the audience.  And that this might be a generic starting point (regardless of specifics like this timeline question).   Do as well as you can for as long as necessary.  Ruth said, thinking about it, the time is more opened and a stronger requirement.  But feels ok with taking out the “the” part.

Matt had one other suggestion - defining data in such an expansive way, might be looked at closely and potentially changed.  Mark said you should cite your sources, but recognizing that we are changing publishing culture.  So start broad and let them narrow it down.  Matt - not defining them as data, but as resources associated with the paper.  This is going to be new to this group, and data might be confusing.

Ruth - was ok with changing some of the things related to sources, but not all so that it would not get confusing as to why people were talking about sources instead of data.  Matt mention conversations they had in AMS.  Ruth and Mark discussed software as data etc. and this conversation will change over time.  Bob suggested including software in there (it is).  Mark said, the principle should be - cite your sources.  Which would be the same concept with software.  Ruth was not sure how to change the policy…. Mark asked about the broad definition - Ruth said the def for data was from AGU.  Matt elaborated and mentioned that he feels this might not need to be changed, but it is an area that might be narrowed in the future.

Bob - ‘XYZ reserves the right to refuse publication’ asked for clarification.  Ruth said this is for publishers, and this is an example case where someone can follow and do it sort of intent.

Matt - last sentence in the journal policy.  There is a difference between what they are working on in AMS.  The definition is not a data citation the way they are talking about.  Ruth said this part is saying, I cited my data but you have to go to this repository etc. to request the data. Mark said this should be reworked.  You shouldn't use acknowledgements, you should use the reference section, but you can use them to provide special notes etc. in the acknowledgements.  Ruth said, so this should ahve a follow on clause.  Denise and Mark discussed this a bit more.  Ruth said there were two things to do, edits and to modify it.  Suggested the group look at the author guidelines and make a similar adjustment.

Mark things the AGU people might have been confused.  And that we should remove the acknowledgement statement (?), and Matt and Bob agreed.  Mark suggested methods or a footnote would be a better way to address that.  And that acknowledgements is bad practice, and we should not suggest it is a suitable substitute for citing data.  Bob suggested that changing it to say data should be cited in the reference section of the manuscript.  Ruth is going to move this up front to the bolded section.

Ruth asked about if we should include comments on restrictions on access to data.  Denise said that was important, as people might not bother mentioning or citing it otherwise.  ‘If in unusual circumstances, if there are access restrictions...’.  Mark disagreed. The fast majority of literature has access restrictions, and we cite that just fine.  We don’t make a distinction for open access articles.  So we shouldn't treat data differently.  Denise clarified, that people assume when they see data cited that they should be able to access it with out difficulty.  Someone can find it for me, I can request it from the author.  But data has more restrictions to access.  And we cite so someone else can have access to look at it.  And maybe we should mention restrictions.  Vicky said she agrees with Mark, and the data provider should be on the hook for explaining how you get access to the data.  And it clutters the citation.  And Mark said we could advocate for open data, but this is about citations. Vicky thinks it adds too much details to this, and access is a separate issue.  Denise said when using proprietary data her agency requires that she states that in publications.  Ruth said this is the difference between us making suggestions to publishers, and publishers having to have guidelines on their data.  But this isn’t the necessary place for this.  

Ruth asked - if the new version was better.  She is asking because the publishers roundtable is in three weeks and needs this to take there.  

Winter meeting sessions

Topic that came up - holding a citing data dynamic workshop at the winter meeting.  RDA did one in the UK and recently released a report.  Ruth thought a workshop with a few use cases, pros and cons from community standpoints (like they did at the UK event), if this was something we wanted to do in January as a session proposal.  Denise asked if workshops were typically held at the winter meeting. Ruth said you can, and they have had day long ones.  But we have not.  Mark said the semantic web guys have, Ruth mentioned Drupal.  Mark thinks this could be very useful.  And wondered how long it would take.  The UK workshop was two days.  And they went through 4-5 case studies.  We could tighten it up, do some pre-work.  Mark said Matt is on the call because NCAR and UCAR are doing some work in this area - how two groups with similar data be citing it etec.  And Mark thought this could be a conversation starter, with a workshop at ESIP, and invite some of the organizers from the UK event.  Ruth thought that this was a good idea.  Both she and Bob thought the multiple repository issue is important.  Matt gave some examples. He said they are interested in the general case, would work with NOAA for specific test cases.  Mark said there are two things 1) duplicate data thing which is a new topic of concern, and 2) dynamic data question.

Mark is suggesting would the RDA model work for ESIP members, and would they be willing to adopt it.  Ruth thought that could be a presentation with a few examples, which would be less intense than the UK workshop.  Mark thought that there would be more value in getting a better understanding of the collections etc.  Ruth said at ESIP we could pick some collections people in the audience knew well and focus on how to do it for these collections.  Mark thinks that is what they did in England as well.  Ruth and Mark had different interpretations of the event.  But we can figure out how we want to do this ourselves.  To really test this model to see if we can implement it, and what would be the issues in implementing it.  Would users and publishers accept it.  Ruth has a big concern related to the last point, it is based on basically the users using what they downloaded in their work and being willing to cite it even if they threw away 90% of the work.  So is there a way for a repository to know what the user actually needs to cite.

Ruth thought this sounded like a topic, should it have a session to itself or a topic in a discussion session?  Mark said we have had discussions in the past about this, named a few examples.  And this is really getting to the brass tacks.

Ruth thinks we could get half a day, and maybe split into smaller groups with a data specialist who can focus on what we need to do with that particular data set, otherwise we might not have the right amount of time.  Mark agreed.  Have an intro session.  Have invitations to a few key data providers who might want to use this and will do work in advance.  Have introduction on the RDA model.  And here is a quick overview of the 3-4 collections we are talking about, break in to these groups led by the expert.  They walk through it, discuss what they did.  We analyze and critique.  And the group goes back together to discuss the feasibility of implementing this model.  Ruth agreed, would there be enough communities interested though?  Mark said Matt - who said someone from UCAR would likely be there, NSIDC, Jeff DLB or one of the NOAA data centers, may be Curt and the climate assessment group as an example, Ruth said or MODIS or other NASA data.

Ruth asked if people would be interested in attending such a session.  Denise said yes, as a session.  Would be more likely to attend than a workshop.  A half day rather than a full day.  Ruth said our normal planning session, a session on citing dynamic data, and last summer we had a results section for this committee - did we think that was successful and should we do it again?  Vicky thought it was overwhelmingly successful.  Denise thought we were able to get more done in the planning session.  Vicky and Denise recapped the process for people like Mark who were not able to attend. They heard a lot of positive feedback on the session.

Mark suggested Ruth and he meet at RDA in amsterdam and corner the organizers of the UK version for their advice. (Sarah would like to join).

Actions: 

Sarah will send out a new scheduling poll

Ruth will update the travel budget

Sarah will put up the three sessions in the commons.  Denise as lead in the recording, Ruth for the planning, and dynamic Mark listed to help us find a lead.

Citation:
Duerr, R.; Preservation and Stewardship Committee Telecon 2014-09-12; Telecon Minutes. ESIP Commons , September 2014