Alternative solutions for end of life data

Abstract/Agenda: 

Throughout the data life cycle, data may no longer meet the needs of the user community or it may no longer be supported by the data provider/producer. Removing data from the repository is one option. Preservation of the software and input to recreate the data is another. After brief presentations, we will break into small groups to brainstorm additional ways to handle end of life data. We will then report back to the whole and discuss potential pros and cons of suggested solutions.

Session Agenda (15 min. presentations, 30 min discussion)

  • Introduction - Nancy Ritchey
  • Hibernating Software - Valerie Toner
  • Agile Data Curation - Denise Hills
  • Discussion - All
  • Wrap-up - Sarah Ramdeen
 
Notes: 

Nancy -

  1. Opened the session with a brief introduction.

 

Valerie Toner - “ Hibernating Code: A Different Take on Long-Term Stewardship”

  1. Resources could be limited, so it might not always be possible to save data for the long term.

  2. “Hibernating Code” = how to safeguard software to be used within a specific time frame in the future.

  3. “Hibernating Code” is a pilot program at NOAA.

  4. The background of the program was based on a real case of a set of codes that was produced by an individual who was no longer going to be available to support the code, but the code might still be needed for future uses.

  5. Codes that are being hibernated need to be/have:

    1. Self-contained: the code and the related product are both mature with high quality.

    2. Restrictions

    3. Audience: Could involve domain experts, but mainly for software developers.

    4. Reproducible: include test data and output description.

  6. Resources, including people and infrastructure, could both no longer be available, and this highlights the importance of preserving codes.

  7. Provenance is important to be included while hibernating codes.

  8. Question from the audience:

    1. How can we ensure the software that is needed to run the codes are still going to be available after a long time (say, 30 years)?

      1. The “Hibernating Code” project is not meant for software for more than 5 to 10 years.

      2. After the initial 5 to 10 years, the current code that is hibernated could be replaced by a “newer” version.

      3. There is also going to be a reappraisal process that will enable the review of the hibernated codes and to determine if the codes should continue to be hibernated, be replaced, or be discarded.

    2. What are the types of documentation that are recommended to be included with hibernated codes?

      1. A guideline was generated by the “Hibernating Code” project lead, and this information will be shared by the presenter afterward.

  9. Comment from the audience:

    1. Ideally, it would be really helpful to have a tool that could help in updating the codes to the desired current format.

    2. It is also important to save the algorithm along with the code and the output, so that the future programmers could understand how to help in updating the code as needed.

 

Denise - “Agile Data Curation at a State Geological Survey”

  1. Denise worked with many “data objects” that might not be digital but could still face end of life issues.

  2. In terms of human knowledge, unplanned events could also make people suddenly become unavailable, and this could lead to the lost of data/information relating to the data objects.

  3. We also could not predict future uses, so it is challenging to determine which data to save and which data to discard.

  4. End of life does not necessary mean disposal of the data.

  5. Data curation method at the Geological Survey of Alabama is to apply Agile Software techniques to data curation, especially in terms of metadata capture.

    1. In other words, the metadata is built iteratively.  Also, “individuals and interactions over processes and tools” -- allowing people to participate in the metadata capture process incrementally.

    2. The reference for the agile workflow discussed in the session is the following: Hills, D.J. 2015. Let’s make it easy: A workflow for physical sample metadata rescue. GeoResJ, vol. 6, p. 1-8. doi:10.1016/j.grj.2015.02.007

  6. Learning from what other people have done and not having to do something from scratch could help in avoiding mistakes from being made.

 

Discussion -

  1. If a process to deal with data at the end of life is developed, the process should be published regardless of the discipline/object type, so that the information can be reviewed and reused by someone else as appropriate.

  2. How about metrics for data rescue/recovery?  In other words, what is the return on investment for data rescue/recovery?

    1. Denise is working on compiling a report summarizing the data rescue cases that she has worked on, and she will be publishing this report to share the experience.

    2. There needs to be a reason to perform a data rescue/recovery.  “Blanket” data rescue/recovery will not likely to be cost effective.

    3. A related effort is the Provenance and Context Content Standard (see: http://wiki.esipfed.org/index.php/Provenance_and_Context_Content_Standard).  The standard could be a good way to evaluate and determine the method for developing end of life strategies.

 

Closing Thoughts by Sarah -

  1. Identifying the specific needs to address could be a realistic and efficient way to determine the strategies that would be implemented to address the objects that are end of life.

Attachments/Presentations: 
Citation:
Alternative solutions for end of life data; Winter Meeting 2016. ESIP Commons , December 2015