Curation of Virtual Data Collections

Abstract/Agenda: 

Join us in furthering the Curation of Virtual Data Collections!  

Earth Observation Data Collections go back to the prepared thematic CD-ROMs generated by archive centers and science teams in the previous century.  Since then, they have evolved to existence on the Internet, but still usually with all the data physically co-located. Virtual Data Collections broaden the types of data resources that can be included to include remote URLs to either data or easy-to-use REST(ish) URLs for data services such as OPeNDAP, w10n, and OGC W*S.  Meanwhile, advances in data curation around specific themes are underway in such projects as EONET and Dark Data Curation.

Agenda:

  1. What the Heck Is a Virtual Data Collection?
  2. How Are Virtual Data Collections Created?
  3. Cool!  What's Next and How Can I Get Involved?
Notes: 
  1. A virtual data collection synthesizes remote data and information resources that are related to a specific theme into a machine-actional file of metadata, so that the data collection can be used by a variety of applications.

  2. The main motivation for developing virtual data collections is to enable discovery of and access to data and information that pertain to a common theme, but are dispersed and managed separately.

  3. There is a wide range of end-users and curators that can benefit from virtual data collections.

  4. There are many potential themes for virtual data collection; for example, it can be based on an “event type”, such as a volcanic eruption.  

  5. Other potential collection themes are: research area, application, class lab exercise, published paper, field experiment, portal (collection of a group of project portals that have related information/focus), and official report finding provenance (ex: National Climate Assessment).

  6. Question from audience: Would it be possible to merge some of the above themes, such as the official report finding provenance and published paper?

    1. Answer: It is possible, but during the creation of the collection, it could also depend on additional factors, such as the method in which the information is made available/licensed, to determine if multiple themes should be combined.

  7. Question from audience: Why should the schema for the virtual data collection be different from the “standard” data collection?

    1. Answer: Currently, “standard” data collection is more “homogenous” in terms of data content.  The goal for “virtual” data collection is to expand to related information as well.  

  8. Web Map Context at OGC (http://www.opengeospatial.org/standards/wmc) and USGC ScienceBase (https://www.sciencebase.gov/catalog/) are good resources to review the definition/distinction between the “standard” and the “virtual” data collection.

  9. Input from audience:

    1. It is important to add the contextual information for the data collection, so that these information are available alongside with the data themselves.

    2. It is also really important to understand how the data collections are being used.  This information should also be part of the collection.

  10. One of the key processes while building virtual data collection is to take the Partially Qualified URLs and convert them into application specific formats.
  11. Question from the audience: Is semantic web/linked open data being utilized for the virtual data collection effort?

    1. Answer: Right now, these technologies have not been used or integrated into the process yet, but Chris will bring this question up with the working group to determine how the working group could leverage these technologies for the virtual data collection.

  12. Question from the audience: Is it possible that the virtual data collection providers end up providing support for the data themselves?

    1. Answer: It should be be an issue with the current virtual data collection process because the information that is being used to build the collection is the URLs.  However, a related concern is when the URLs are not maintained properly.  One way to mitigate the risk of providing non-working URLs is to collect as many permanent URLs as possible.  Another way to help with this issue is to provide metadata as the initial information to the users, and once the users have selected specific datasets from the data collection, the system can then help in determining the best way to access the actual data.

  13. Current curation methods considered for building virtual data collection:

    1. Manual

    2. Tool-Assisted

    3. Community Curation

    4. Automated Curation

  14. Input from the audience: It might be helpful and worthwhile to consult web designers regarding the final presentation of the virtual data collections because the user interface and user experience (UI/UX) would be very important in influencing the interactions between the users and the data collections.  In other words, the usability of the virtual data collection would be crucial in building a positive experience for the users and their understanding of the collections’ value.

 

Actions: 
  • To review the ability to track changes/revisions of virtual data collections.
  • Call for volunteers to:
    • Build sample virtual data collections.
    • Assist in converting the application specific formats.
    • Plan activities for ESIP Summer Meeting 2016 (ex: hackathon for virtual data collection/converter, and...?)
Citation:
Curation of Virtual Data Collections; Winter Meeting 2016. ESIP Commons , October 2015