Earth Science Data Analytics 101
The broad set of techniques called Earth Science Data Analytics (ESDA) has a clear meaning to everyone, though the meanings often differs depending on the various uses of the data. Data Analytics discussions can range from developing custom code for discovering the signatures in data to leveraging tools that enable predictions to be derived from heterogeneous datasets. This session, uniquely presented by field experts, attempts to introduce the scope, complexities, and possibilities presented by ESDA to further facilitate Earth science.
Guest speakers during this session will describe data analytics use cases that they employ in their work. The goal of this session is to help organize and stimulate Fedreration partners in thinking about how we can facilitate Earth Scien ceData Analytics through Information technologies and tools.
Our agenda (guest speakers) include:
- Steve Kempler (not really a guest speaker), GES DISC – 'Analytics and Data Scientists, Session Earth Science Data Analytics 101'
- David Bolvin, Precipitatrion Measurieng Mission (PMM) Science Team – 'From Many, One (or creating one great precipitation data set from many good ones)'
- David Gallaher, NSIDC – 'Reconstructing Sea Ice Extent from Early Nimbus Satellites'
- Thomas Hearty, GES DISC/AIRS Data Support - 'Sampling Total Precipitable Water Vapor using AIRS and MERRA'
- Radina Soebiyanto, GCDC – 'Using Earth Observations to Understand and Predict Infectious Diseases'
- Tiffany Matthews, LaRC, ASDC – Technology Wrap-up
- All - Panel Discussion: Q&A with audience focused on: How can the Federation information technologists support/facilitate Earth science data analytics oriented research
Come one, Come all
Notes: Different Types of Earth Science Data Analytics
- Descriptive - Analyze multiple datasets to describe conditions
- Diagnostic - Analyze data to determine cause of condition
- Discoveritive - Analyze multiple datasets to uncover new information
- Predictive - Analyze multiple datasets to assimilate future conditions
- Prescriptive - Apply information to determine best action to take
Getting back to the ESDA Home Page: http://wiki.esipfed.org/index.php/Earth_Science_Data_Analytics
Thanks to all who attended this session, interested in learning more about Earth Science Data Analytics (ESDA). We hope the session captured some of your interest in participating to further mature this relatively new area of Earth data and information science. If you are interested in providing small amounts of time to pursue the activities of this cluster (See ESDA 201 Session notes - http://commons.esipfed.org/node/2723), please e-mail Steve Kempler (Steven.J.Kempler@nasa) to express your interest, and in particular ideas/thoughts on the subject.
The ESDA Cluster, attracting a lot of interest, continues to ’churn’ through the process of maturing their understanding and impacts of this new paradigm: Data Analytics and Data Science. Session participants, reflecting the purpose of this session, were in attendance to ‘learn’ what Data Analytics means in the Earth science context. And to discuss the goal of the ESDA Cluster through Federation science and technology expertise:
To facilitate making information into knowledge
Describing ESDA was done through 6 excellent presentations (see agenda above) that:
1. Introduced the topic
2. Provided real use case of how different types of data analytics are employed
3. Provided an view into ESDA technologies available
All presentations are available through links provided below.
Steve Kempler’s introductory presentation message was that although Earth science users have been working with heterogeneous datasets for a while, and technology has been accommodating usage capabilities, what is new is the need to advance and implement the ability to provide infrastructure, technologies, and tools, to efficiently analyze data and information in order to extract knowledge. This is best addressed through:
Data Preparation – Making heterogeneous data so that they can ‘play’ together
Data Reduction – Smartly removing data that do not fit research criteria
Data Analysis – Applying techniques/methods to derive results
Data Analytics Definition: The process of examining large amounts of data of a variety of types to uncover hidden patterns, unknown correlations and other useful information.
Dave Bolvin discussed the techniques utilized in merging datasets from different sources: Inter-calibrate based on relative quality, morphing, forward/backward propagation, Kalman Filter… you should really see Dave’s presentation, linked below, to derive a high quality product. This might be considered multi-dataset Descriptive Analytics.
David Gallaher followed with a discussion on how he extracted Nimbus data from 1960’s vintage magnetic tape to examine sea ice extent compared to sea ice extent measured by Terra/MODIS. David’s work might be considered single-dataset Descriptive Analytics.
Thomas Hearty’s analysis might be considered Diagnostic Analytics. His work entailed comparing Total Precipitable Water Vapor from two sources: AIRS instrument and MERRA reanalysis. Thomas walked us through the steps he took match up the datasets (e.g., co-register) to understand why measurements were not matching, lending his results to improve AIRS processing algorithms, and thus providing a better product.
Radina Soebiyanto’s presentation on using Earth science data to better understand and predict infectious diseases, exemplifies Predictive Analytics. Radina describes the methods utilized in her research, including: Data aggregation, logistic regression, neural network, decision trees, to uncover Earth science/health data relationship patterns.
Tiffany Mathews completed the presentation session with a discussion on promising data analytics technologies for each of the Data Analytics types, identified by tools for: : Data conversion, visualization, data discovery, semantic web, subsetting, data modeling, and metrics.
Presentations were followed by short discussion focused on the depth and breadth of what performing Earth science data analytics might include. Most noteworthy, was the introduction of the UV-CDAT Project, climate data analysis tools, developed by a team led by Lawrence Livermore National Laboratory (http://uvcdat.llnl.gov/index.html).
This was an information/learning session. The only intended action was to participate in the next ESDA Session, ESDA 201. See ESDA 201 (http://commons.esipfed.org/node/2723) meeting link to review the ESDA cluster discussion and activities, and resulting actions.