Deep Data: Managing and Accessing Data at the Parameter Level
MIn order to develop tools that access large quantities of data, we must go beyond the data collection and data granule level. We must be able to both manage and access data at the level of the parameters (aka variables, or measurements) inside the files. A number of applications (e.g., Giovanni, Live Access Server) and standards (e.g., WCS) attempt to do this, but are impeded by the typical ways in which the data are stored, catalogued and inventoried.
This session will provide a survey of key applications and technologies that access data at the parameter/measurement/variable level, as well as how data are typically managed. It will be followed by a working session at which we roll up our sleeves in order to make progress on improving the management of such data to make them easier for applications to work with.
The key questions to be answered in this session are:
- What major Earth science tools work with data at the Measurement-Parameter-Variable level?
- How is Measurement/Parameter/Variables information used by that application? Visualization, Subsetting, Selection Forms, Discovery?
- Where is that information stored and how is it obtained and maintained?
Agenda:
9:00 | Problem Statement | Chris Lynnes |
9:10 | AppEEARS | Jason Werpy |
9:20 | ODISEES | Beth Huffer |
9:30 | DarkData: Data Curation Service | Manil Maskey |
9:40 | DarkData: Rule Service | Stephan Zednik |
9:45 | DarkData Giovanni | Maksym Petrenko |
9:50 | MAPSS and Aerostat | Maksym Petrenko |
9:55 | AESIR: Variable Catalog and Giovanni | Mahabaleshwara Hegde |
10:05 | Unified Metadata Model - Variables | Simon Cantrell |
10:15 | Other services and catalogs | Chris Lynnes |
Identify application of “deep data” at the variable-level info; key issues with describing and maintaining variable-level info; solutions for above issues; identify steps going forward
1. Presentation: LP DAAC AppEEARS Data Access
Interface: Interface for selecting point based data from the LP DAAC archive, analyzing that data, and downloading the specific values desired
AppEEARS: access and extract LP DAAC’s tiled MODIS and WELD data
Capabilities:
- Select variables
- Review order
- Review variables
- Compare variables
Q:
- how to combine different variables? Geolocation to fetch data;
- Do you have any index? Use OpenDAP to access data, which are organized by the slice of time
2. Presentation: ASDC ODISEES Overview
Formal descriptions of variables map similar variables to one another
Prospective data users can also see how similar variables differ by their unique characteristic sets. -> more criteria are added to search, more accurate query results.
- Data model - An Ontology-based data model implemented as an RDF triplestore
- Meet the diversity, complexity, and evolving in the data
- Flexibility for schema changes and now data types
- People can get the data by inputting the related words
3. Presentation: Automated analysis workflows for Earth science events using parameter level relevancy
Goals: design a semantic middleware layer to automatically exploit metadata resources
Data curation service:
- Finding relevant data variables:
- Actual data variables
- Data set level science keywords
- Granule data fields and metadata
- How to map: CF standard name to GCMD, CF units to GCMD; text processing; NLP
A rules-based service for suggesting visualizations to analyze Earth Science phenomena
Goals:
Conceptual model: phenomena -> data field -> visualize candidates
Results and evaluation:
4. Presentation: Dark Data services + Giovanni
Standard edition; variables(time, space, plot type)
Dark Data-service integration: event info -> relevant datasets -> services (Glovanni) -> plot
Variable-level data in DarkData services
Dark Data Edition: key words to get the relevant events; demo: pull events info from EONET, pull relevant datasets/variables from DarkData Curation Service, pull relevant services from DarkData Rules Service;
5. Presentation: Variable-level tools at GES DISC: UUI, MAPSS, AEROSTAT
UUI(unified user interface)
- Bring together deverse data services, e.g. SSW, OPeNDAP, OpenSearh ….
- Driven by metadata
- Data search and subsetting by variables and dimensions
- Variable metadata extraction
Q: what does “dark” mean?
6. Variables @ NASA GESDIS
Variable attributes: keywords, disciplines, resolutions, source, measurements, color palettes
Variable catalog curation
Sources of variable information: collection metadata, granule metadata, documentation+subject matter expertise
Challenges:
- some attributes known only to domain experts
- Metadata provider may not be aware of how it is consumed
- Questionable quality when they exist
- Incomplete metadata
- Several standards or conventions to choose from
- Conventions are recommendations and are not mandatory