Information Quality Cluster – Introduction, Reporting and Use Case Tutorial

Abstract/Agenda: 

The Information Quality Cluster was reactivated during ESIP Federation Meeting in July 2015. The objective of IQ cluster is to bring together people from various disciplines to assess aspects of quality of Earth science data. We will be learning and sharing best practices with a goal to build a framework for consistent capture, harmonization, and presentation of data quality for the purposes of climate change studies, Earth science and applications. The efforts and goals of this cluster are not predefined and are motivated by the participants of the cluster, so new ideas and participants are always welcome.

 

The purpose of the session is to provide an introduction to the cluster’s activities and report on the progress made during July through December 2015. Following this, the session will provide a brief turtorial on use case development in preparation for the follow-on session on Use Case Development.

Agenda (Draft):

  • Introduction – Ramapriyan – 10*
  • NASA Data Quality Working Group (DQWG) Update – Moroni – 10
  • DQWG Data System Integration Committee – Downs – 10
  • NOAA Stewardship Maturity Matrix Update – Peng – 10
  • Use Case Development Tutorial – Moroni – 40
  • Discussion/Preparation for follow-on session – 10

*Numbers shown are minutes assigend to each agenda item.

Notes: 

This is the first of two sessions of the Information Quality Cluster. The next session will focus on developing a few use cases.

Rama - “Information Quality Cluster - Introduction”

  1. The co-chairs for the Information Quality Cluster are Rama, David, and Peng.

  2. There are 4 key objectives for the ESIP Information Quality Cluster (see presentation that is attached in the section below).  However, it is important to note that the objectives can and will evolve with participant inputs.

  3. Currently, the Cluster has a set of key activities that it has been focusing on and will continue to pursue for 2016.

    1. The key activities mainly involved with collecting use cases, identify needs for capturing, describing and conveying quality information, and prototyping methods to support the communication of quality information as a result.  For the long term, it would be important to assist in developing emerging standards for data quality.

David - “NASA Data Quality Working Group (DQWG) Update”

  1. The working group was originally formed through NASA agencies, but since then, the agency types have expanded to include US Interagency and foreign/international agencies.  

  2. The recommendations developed through the working group are mainly for ESDIS, DAAC’s, and NASA data providers.  Also, the main domains of interest are data system (data distribution and archive systems) and scientific data producers.

  3. There are 4 key data quality management phases:

    1. Capturing, Describing, Facilitating Discovery, and Enabling Use.

    2. Each of the phases has “low-hanging fruit recommendations” identified for both the Distributed Active Archive Centers (DAACs) and Data Producers, so that the recommendations could be adopted with low barrier/minimum complication.

  4. The complete set of recommendations is included as a part of the presentation (the presentation file is attached in the section below).

Bob - “Data Systems Integration Committee of the Earth Science Data System Working Group (ESDSWG) on Data Quality”

  1. In 2015, 16 use cases relevant to the NASA Earth Science data and Information System were collected.  These use cases then helped in producing recommendations in different areas: accuracy/precision/uncertainty, distinguishability, applicability, and usability.

  2. This year, effort will continued to be made to prioritize and refine the recommendations further.

  3. The mission and goals/objectives of the Data Systems Integration Committee are set mainly to ensure recommendations are established in order to improve data quality, including the areas of data description and discovery.

  4. There were also “low hanging fruit” recommendations identified by the Data Systems Integration Committee.

  5. A sample of potential implementation solutions is provided in the presentation (the presentation file is attached in the section below).

  6. For future discussions, characteristics of data quality reviews, attributes of data quality, and data quality for specific purpose are all key topics to explore futher.

  7. In the future, new recommendations beyond the “low hanging fruit” could potentially be added.  However, additional use cases will need to be collected in order to develop these new recommendations.

Peng - “ Data Stewardship Maturity Matrix -- Update and Future Plans”

  1. The Data Stewardship Maturity Matrix (DSMM) is a unified framework for measuring stewardship practices applied to individual digital Earth Sciences data products.

  2. There are 5 levels of maturity that  a data product could achieve depending on the performance/characteristics for the 9 categories defined for the matrix.

  3. There are several ways that one could use the DSMM and the assessment results.  Overall, the DSMM and its results can help reveal the stewardship practices that are actually being applied and identify areas for improvement.

  4. In 2015, there were several use cases that were developed for the DSMM. New use cases will continue to be collected in 2016, and additional reviews will be conducted for the existing use studies.  Completeness, appropriateness and usefulness of DSMM will also be evaluated.

  5. Submission of papers to publication regarding the description and summary of the work done thus far are also planned.

  6. Question from the audience:

    1. What is the amount of effort required to use the DSMM to perform an assessment of a data product?

      1. The amount of effort required depends on the familiarity with the DSMM as well as the data product itself.  If the person who is responsible for the assessment is familiar with the data product and also has access to the key team members who have key knowledge of the data product, the assessment process could be completed in a few hours.  In general, the efficiency of the assessment process should improve as more data products are reviewed by the same person/team.

David -

  1. Provided an overview of the Data Quality Use Case Submission Form.

  2. The “Use Case Narrative: Goal and Context” is meant to document what kind of data quality situation that the use case is intending to describe.

  3. David also built a use case as an example of showing how the Form could be used.

  4. For further details, the Form has a documentation that can be accessed using the following link: bit.ly/ucrefguide

Actions: 
  • To provide definitions of the different user and stakeholder types in the Data Quality Use Case Submission Form.
  • To combine the two different options for NCEI to only "NECI" udner the "Affiliation" section of the Data Quality Use Case Submission Form.
  • Please let David know if there are interests in receiving a video recording of the instruction for the Data Quality Use Case Form.
Citation:
K., H.; Moroni, D.; Peng, G.; Information Quality Cluster – Introduction, Reporting and Use Case Tutorial; Winter Meeting 2016. ESIP Commons , October 2015

Comments

rdowns's picture

Data Systems Integration Committee of the Earth Science Data System Working Group on Data Quality - Robert R. Downs