Data Stewardship Committe Activities - Reporting

Abstract/Agenda: 

The Data Stewardship Committee has been active over the last year in a number of different areas - Provenance and Context Content Standard, Physical Object Stewardship, Use Cases, Collection Structures, Identifiers, Citations, Data Management Training, Preservation Ontology, Data Stewardship Principles. Subgroups of the committee have been focused on these different topics. This session is an opportunity for the full committee to obtain reports on progress on each of these activities. The reports should include implementation within agencies (e.g., NASA, NOAA, and USGS) and organizations that have resulted from the deliberations in the ESIP Data Stewardship Committee.

 

Notes: 

Data Stewardship Activity reporting

Notes:

Rama introduced the session for those who are new to the group.  Presentations will be 10 minutes in length and we have a packed schedule.

 

Denise Hills (Slides below,  ESIP-S14-DSC-Reporting-PhysObj.pptx )

Presented on her work with the PCCS.  She had Rama introduce PCCS - a standard to help future users understand what data sets are, what they mean, what they generated, the context in which they were used etc.  It is a matrix.

As this moves to a standard, Denise is testing to see what other data one can use this with.  She presented at AGU about how some of the categories map towards a physical object collection including maps, cores, cuttings and other physical items.

She used their thin sections as a first test case.  She demonstrated internal documentation which was hard to read/understand.

They are currently putting their data into schemas and looking at what information they are capturing, what they need to capture, and how that maps to the PCCS.  She is finding that for physical objects (she will share some of these images later).  She reviewed some of the mapping and the variations she made in the categories to express the same idea but different data type.  There were many numbers where there was limited overlap (particularly in items 3 group). She asked if anyone using physical items could share items they think would have mapping (photos, other physical objects, etc.)

In summary, the high level categories work well, but the lower level points are not mapping one to one.  So refinement of those categories would be useful in broader application for these categories.  More investigation is needed.

Ruth Duerr

Ruth reported on two activities.  First on physical materials, then on citations.

Physical objects - there is a draft proposal to codata to establish a task group on the management of physical objects in the digital era.  This was submitted and came back with comments (expand the international participation, etc.).  Chris L. said that they had submitted the revisions and they expect to hear back in the fall, but not positive that it will be approved.

OSTP memos related to data and publications - there is one on scientific collection (aka physical objects) which asked for management and access policies in September.  Things owned, directly managed or financially supported by federal agencies.

Data citations - Asked if anyone in the room had not heard about the joint declaration of data citation principles.

ESIP created a list of principles a few years ago, and this topic is gathering steam in other areas.  Paul Ulir of BRDI said, instead of everyone doing something different, shouldn't we all do the same thing?  30-40 organizations working in this area came up with this Joint declaration.  Representing ESIP were Ruth, Mark Parsons and a few others.  They are complete now and seeing endorsement.  When they came up, we discussed if ESIP should endorse them.  We agreed to do so to the rest of ESIP.  The executive committee asked Ruth to draft an endorsement statement.  Ruth read the statement allowed, this will be voted on by the ESIP assembly.

The original group of the joint group is working on an implementors group to bring things down to a more detailed level of specifications.  There are three teams running right now, identifiers and metadata, publishing workflows, common repository interfaces, NISO JATS updates.  These are open and are represented by a large variety of members.  If anyone is interested, Ruth can help people join.

Other citation activities - All AGU publications now require data citations using the ESIP guidelines.  But have not had any requests for further updates.  And RDA has several working groups in this area looking for use cases.  Other objects like software are starting to discuss these topics as well.

Vicky - DOI & landing pages

INtroduced the work done in ESDSWG working group at NASA making DOI required.  Overview of DOI’s (brief).  She presented a summary of the recommendations, for example they should be opaque like those posted in journal articles.  ESDIS will make DOI’s for groups that do not have EZID accounts.  And they should have a landing page, and should include GCMD DIF as well.

Following on this, the ESDSWG has been discussing landing pages - what does it need to have, what should be on it, what doesn’t below etc.  Developing a recommended practices form.  Open to discussion and feedback for this.  If including too much or too little, etc.

For example, with DOI landing pages, creating some examples.

Showed the initial recommended practices - link to ISO 19115-2, DACE specific identifier,  (data sets held by the DAACS were the start of this) etc.

Recommended practices, citation information, links for download and papers, contact information on the DAAC hosting the data etc.  

This is still in development and they are open to feedback.

User model - user needs working group established in 2013, if you would like to see it and do not have access, contact Vicky and she can provide you with access.  This was developed in an effort to determine what was core information needed by different users.  What do you need to provide to them to focus.

The group developed 4 different users types.  ANd now it needs to be used by other groups.

Last slide was a demonstration of the user types, the different categories and how important they are for data discovery.  She is more than happy to discuss this with anyone who has questions or comments.

Bob Downs - experiences in implementing DOI’s

At SEDAC, NASA recommended going with the DOI and they worked on how that would be implemented.  they might be appropriate today, but in the future there could be something different and they want to be able to move with the times.

They tested the feasibility using an EZID account.  And trained their staff in how to use that system.  Their CMB approved the guide and the inclusion of the DOI and ciation for each data set.  

They then went in to an operations mode and report these to NASA monthly.

They had to develop a crosswalk with the DataCites elements and the FGDC CSDGm elements.

Provided a screenshot from the EZID and what is necessary to assign a DOI to a dataset.  This includes a landing page.  The EZID system allows them to easily manage the DOIs.

These DOIs are harvested by datacite and are included in their system.

Looking at one of the records in Datacite you can see the details shared, and then you can follow the link to the landing page (as recommended as part of the citationn).  This includes download links, maps, discussions etc.

Looking at the page also has a recommended citation for the data.  For this particular example, they have two citation recommendation, the authors wanted the citation for the data and the article both listed.  For authors who want (or need for tenure etc) to include both.

Rama - Emerging PCCS activities

Activities - papers, NASA preservation content specification, nasa esdswg presentation related wgs, nasa doi implementation.

Rama introduced himself and his position at NASA, dealing with data management.

A few years ago they decided to publicises the work they did in the ESIP committee, this an opinion article which was rejected by Science - too narrowly focused, not a standard yet.  They are considering submitting it to EOS.

The DLIB article on data stewardship in the earth sciences - this is almost ready to be submitted.  Will talk more in the planning session on Friday.

Based on recommendations from ESIP, they made a checklist for NASA operations.  It is required for new missions, and has a checklist for operations currently running.  Was provided to HQs which is working on Phase F - closing out currently running operations.  They are now using these standards as guidelines when working on the close out exercise.  They are using PCCS as a tool as a reminder to people of things that need to be preserved.  This has been used in a few recent projects.

Preservation information architecture wg - 2013-2014, established a context for PIA standards using PCCS, PCS, and PROV-ES.  Identified use cases. and recast and emphasised the PIA in the project lifecycle.  Continuing this year as the Data Preservation Practices WG (2014-2015).  They are trying to broaden the use cases across a number of activities and missions.  They all have different kinds of lifecycles, and has to capture knowledge that might be lost with team members leaving the project.  Defining guidelines for planning and managing.  Trying to make it broader than satellite missions.

Working on DOIs in the ESDIS wg, followed key recommendations and in the future they will be focusing on landing pages.

Nancy Hoebelheinrich - education activities

Data management for scientists short courses - 35 modules, created by volunteers, peer-reviewed and 10-15 minutes long.  They are all listed on the ESIP commons.  There are 4 categories.  Available as powerpoints and videos on the ESIP’s Vimeo channel.  PPT, text of presentations and audio files which are a recording of the script.

Following the creation of these modules, they worked on improving discoverability - using the LRMI suggestions on schema.org.  They included more metadata tags in the drupl code, and other things to make them more discoverable.

Example of one of the pages, standard values and other key words added.  Nancy reviewed an example and explained the various parts. Hopefully this will improve discoverability.

Testing it, they set up a before and after using google analytics.  Applied one search, but it does not give you the key words for their search, but does give you where they came from, and where they landed and excited.  will need to do more work to get further details.

Marketing efforts - printed brochures, poster, and hoping to do more liaisons with the ESIP education committee.  Also perhaps using more social media channels.

Issues - most important, find resources to continue efforts.  People would like more examples within the domain.  And keep them updated.  Also opportunities to collaborate with others.  And continued marketing and evaulation.

Data Study - Anne

Anne gave a brief introduction to the data study project.  They have written a report (nearly done) and are awaiting a press release.  They identified economics for science data - little money for how to study and manage science data.  As well as some other challenges and goals.

The EOS article they submitted in November, is waiting on the workshop report but is about done and ready to be published.  The workshop report is on the wiki and the nearly final draft is available if anyone would like to see.

They are looking for other partners, but are currently waiting for these materials to be published and do not have next steps yet.

Questions

To those who presented, as well as discussions and comments.

Mustapha - his first ESIP meeting, in the first presentation on physical objects - a codata task group being prepared and comments about opening it to more international groups.  Wanted to highlight that possible working with the WDS members, who have physical object collections.  They are a sister group to codata, and he would be willing to discuss this.  Chris and Kerstin were suggested as contacts.  Also asked about experiences adopting DOIs and was intrigued about landing pages - are you coordinating between DAACs to use the same type of landing pages?  Rama said currently the working group is looking at the minimum required content and Vicky added that what might be additional to maximize the usability of the landing page.  Looking at best practices, not at if adopted or implemented by DAACs.  Mustapha asked if others had landing pages standards, and Ruth said one of the groups she is working with is working on this as well, and wants to check in with Vicky to make sure they are in sync.  NOAA also has a working group at all three data centers.

Karen Baker asked about why a standard rather than a specification?  Rama said different groups have their own specifications, which is often more readable than the PCCS matrix.  But their goal is to create an international standard for content.  Rama wonders if we should separate the two things.  The specifications and PCCS matrix and use the ? committee to make them a standard, and if they should separate it from satellite missions and other types. Denise discussed the motivation for why they are testing it with other data types.  Rama mentioned they have different missions aside from satellite missions being archived in the DAACs.  So even within Nasa they have some thinking to do.

Chris L. - asked about these developing organically, and perhaps that a visual mapping of how these developed might be useful - to see where the connections are and to see what areas need to be exploated.  Denise said this might be useful to see the overlap between various groups.

Mustapha asked if the educational material What type of license (Nancy said CC1) and can help distribute.

 

Anne asked about a group using the modules – for their education boot camps, and if they are doing evaluations.  Nancy said that perhaps questionnaires to evaluate the modules would useful, working with Erin and Bruce to create them.

 

Someone from NCAR – talked about DOI’s and metadata – which standard to use etc.  Decided to issue DOI’s for software and services.  Going well, implementing them for data sets, but the ones for software are doing the best, and is interested in landing pages.  Curt asked how they handle versioning with software, but the commenter was not sure, but know they were not going too in-depth.  They are addressing low hanging fruit but afraid to go in to complete reproducibility.

 

Ruth mentioned Friday mornings planning session.  She discussed why we divided the two up, and said on Friday we will discuss which activities we want to carry forward and what those concrete plans are to do that.  Some might carry over from this meeting, some new ones, and some might drop off. So if you are interested or have a topic to suggest, come to the Friday session.

 

Nancy asked about Rama’s session – about the table with requested and registered.  What does that mean?  Rama said EZID allows you to register, but maybe not active.  And working on a process for a DAAC to request DOIs and give them to them in an early reserve, and once they are ready they move forward.

 

Chris L asked about the meeting time – second Fridays at 3pm EST.  This information can be found on the wiki.  Sarah suggested that we might discuss this Friday to see if we should reschedule.

Attachments/Presentations: 
Citation:
Ramapriyan, R.; Hills, D.; Data Stewardship Committe Activities - Reporting; Summer Meeting 2014. ESIP Commons , March 2014