Frameworks for Data Visualization
Many frameworks have been developed to support science data visualization. These frameworks can be integrated or modular and very often involve massaging the underlying source science data (via a set of specialized procedural steps) into a generalized derivative form (images, grids, tiles) facilitating the visualization process. Visualization frameworks might also simply be data browsers.
We want to identify examples of frameworks that exist within the Earth sciences community and encourage the developers, maintainers, and users of those systems to participate in this workshop at the summer meeting.
For this workshop we are soliciting 15-20 minute presentations that address the following core questions regarding visualization frameworks:
- What are the challenges/functions that the framework is attempting to address?
- What varieties of data products are represented within these frameworks? (e.g., considering data formats, types, and topics -- HDF, CSV, vector, point, remotely sensed, in-situ, etc.)
- How can these frameworks be expanded/adapted to other science datasets? Have frameworks been developed with extensibility in mind or are there requirements that necessitate solutions that are unique to the source data?
- What are the processing considerations when generating intermediate data products that then feed the frameworks?
- How do the above considerations reflect the intended audience and infrastructure?
Demonstrations of existing frameworks are certainly welcome as long as the core questions are addressed as part of that demonstration.
Kevin Ward, NASA Earth Observations
NASA Earth Observations (NEO): Data Imagery for Education and Visualization
NASA Earth Observations (NEO) is a framework that supports the storage and delivery of global imagery of NASA remote sensing data. NEO targets non-traditional data users, primarily in education and outreach: formal and informal educators, museum and science center personnel, professional communicators, and citizen scientists. These user communities have a need for these types of data but do not have the domain knowledge to locate the various source data and do not possess the tools and expertise required to produce imagery from those data. NEO strives to provide a middleware approach to meeting their needs.
NEO currently hosts imagery from more than 50 different datasets with daily, weekly, and/or monthly temporal resolutions. The imagery from these datasets is produced in coordination with several data partners who are affiliated either with the instrument science teams or with the respective data processing center.
NEO is a system of three components -- website, WMS (Web Mapping Service), and ftp archive -- which together are able to meet the wide-ranging needs of our users. Some of these needs include the ability to: view and manipulate imagery using the NEO website -- e.g., applying color palettes, resizing, exporting to a variety of formats including PNG, JPEG, KMZ (Google Earth), GeoTIFF; access the NEO collection via a standards-based API (WMS); and create customized exports for users (delivered via ftp) such as Science on a Sphere, NASA’s Earth Observatory, and others.
Yuan Ho, UCAR
Visualization of Geo-science data with Unidata's Integrated Data Viewer
The Integrated Data Viewer (IDV) from Unidata is a Java(TM)-based software framework that provides new and innovative ways of displaying and analyzing Earth science data, as well as common 2D, 3D, and 4D visualization capabilities. Many features designed in the IDV can be used to view the daily weather as a weather enthusiast, to analyze the model output in the climate research, to study satellite observations, to navigate the deep ocean environment, or to explore the complex three-dimensional data in geophysics. The IDV software library can be easily be used and extended to create custom geoscience applications beyond the atmospheric science realm. The IDV community has expanded from the traditional synoptic meteorology community to include many new disciplines (e.g., climatology, hydrology, oceanography, geophysics, etc.). In this presentation we will present the software architecture and several user examples of these communities for their academic usage and scientific discovery.
David Mintz, EPA
AirData – some design considerations
The challenges in designing a data delivery system can be overwhelming. How do you design a system that is efficient, sustainable, agile, and doesn’t break your budget? Recently, EPA updated its AirData website to meet several initiatives – to consolidate interfaces, leverage existing products, avoid copying and restoring data, and accommodate future needs. This presentation will demonstrate a few visualization tools, focusing on system design and user interface considerations. Topics include how and when to connect directly to source data and the concept of using of cron jobs to generate intermediate products.
Charles Thompson, Physical Oceanography Distributed Active Archive Center (PO.DAAC), Jet Propulsion Laboratory
A Generalized Pipeline for Creating and Serving High-Resolution Satellite Imagery
Generating imagery for a wide range of science datasets via a standardized set of processes is challenging. Inevitably, there is always some number of unique, specialized steps in transforming diverse satellite data products in formats such as HDF or NetCDF into high-quality imagery suitable for outreach, publications, or the general public. Furthermore, as the spatial resolution of remotely sensed data increases, the corresponding imagery becomes more unwieldy to efficiently serve. This presentation will discuss the evolving framework behind State of the Oceans (SOTO), a Google Earth based web interface which visualizes near real time oceanographic parameters such as sea surface temperature and ocean winds. The SOTO backend employs a configurable image generation pipeline soon to be combined with a Tiled WMS server to create a flexible end-to-end imaging system applicable to a wide array of science data products.
Mike McCann, Monterey Bay Aquarium Research Institute
A Framework for in situ oceanographic measurement data access and visualization: The Spatial Temporal Oceanographic Query System (STOQS)
The CF-NetCDF file format and the technologies and conventions surrounding it (THREDDS, Climate Forecast Conventions for Point Observations, etc.) provide a solid foundation for managing archives of diverse collections of data. This technology stack is used successfully for numerical model output and is beginning to be used more widely for in situ measurement data.
Access efficiency decreases with decreasing dimension of NetCDF data. For example, the Trajectory Common Data Model feature type has only one coordinate dimension, usually Time – positions of the trajectory (Depth, Latitude, Longitude) are stored as non-indexed record variables within the NetCDF file. If client software needs to access data between two depth values or from a bounded area, the whole data set must be read and the selection made by the client software. This is very inefficient. What is needed is a complement to NetCDF that provides server-side indexing of any variable related to variables of interest.
Geospatial relational database technology provides this capability. The Spatial Temporal Oceanographic Query System (STOQS) has been designed and built to provide efficient access to in situ oceanographic measurement data across any dimension. STOQS is an open source software project built upon a framework of free and open source software for geospatial data. STOQS complements CF-NetCDF and OPeNDAP by providing an ability to index data retrieval across parameter and spatial dimensions in addition to the a priori indexed coordinate dimensions of CF-NetCDF. It also provides a functional bridge to standards-based GIS technologies.
For more information please see http://code.google.com/p/stoqs/
Mahabal Hegde, NASA
GIOVANNI - The Bridge Between Data and Science
Giovanni is a NASA framework for online visualization and analysis of Earth science data. It is intended to abstract the complications of data formats, structures, quality flags, etc. from the user into a Web GUI-driven user interface that is easy to use. The goal is to transform data exploration from a process that can take days, weeks or months into one that takes only minutes. The current architecture of Giovanni, version 3 (or G3) is being updated to be more flexible and agile with respect to adding datasets and features.
Richard Kim, Physical Oceanography Distributed Active Archive Center (PO.DAAC), Jet Propulsion Laboratory
HITIDE: An Extensible Service-Based Web Interface for the Search, Imaging, and Extraction of Swath-based Geophysical Parameters
PO.DAAC's HITIDE is a web application powered by a trio of web services that facilitate access to and evaluation of swath satellite data products in common scientific file formats such as HDF and NetCDF. The key component to its capabilities is a tile database which allows for precise search, image, and extract functionality based upon spatial and temporal constraints at a sub-granular resolution. With this functionality, users in the science community can efficiently find, evaluate, and download subsetted science data products pertaining to their research. As a system, HITIDE represents a framework which can ingest virtually any geolocated data product accessible via Opendap. This presentation will provide an overview of HITIDE, the considerations when adding a new data product, plus the future direction of development including support of gridded data products, enabling data-specific constraints, and enhanced dynamic imaging capabilities.
Frameworks for Data Visualization Notes
What is a framework? A way to visualize data
NASA Earth Observations (NEO): a repository of imagery of NASA data
- how can NASA simplify access to data imagery (not raw data)
- Science centers, formal/informal education, public, citizen scientists
- Wanted: global imagery and data, high resolution, standard image formats
Current barriers between a satellite data source and the user
- Tool and domain knowledge, data formats, processing
- NASA wanted to provide expertise/middle-ware/expertise to leap these barriers for users == NEO
Decision-process to meet 3 criteria
Global imagery…which datasets?
- Accessible topics (core concepts and user-suggested)
- Accessible data (already online or NASA data expertise able to facilitate)
- Source datasets’ resolutions, real-time processing constraints on fixed budget, users needing 2048x1024
- Global imagery…which datasets?
Based on this for the base imagery chose:
- 3600x1800, data only (no outlines or masks), grayscale (netpbm), jpg/png/geoTIFF
Approaching processing from visualization perspective
- Data that scientist might filter out, but necessary to make good visualization (e.g. if filter out quality for NDVI cut out a lot of pixels, like cloud cover in Amazon)
Provide common temporal compositions
- Daily, weekly, monthly compositions that many datasets already come in, but some not
- Good for comparing datasets in the same timeframe
- Approaching processing from visualization perspective
- Website, WMS (web mapping service), FTP
- Used by target groups, also Weather Channel, Science on a Sphere
- Curation standpoint of NASA: interesting process for how to get and represent the data
any representation of the decision-process for how to represent the data?
- No, needs to be better. Important for Users to know these decisions especially for data processing. Maybe something the group can work on.
Elaborate on interaction Weather Channel and NASA
- Usually an email, starting to grab images on their own though and do visualization processing similar to images for Earth Observatory.
Unidata Program Center
Integrated Data Viewer (IDV)
- geoscience at the speed of thought
- community has expanded beyond academic arena to government and even private entities
- Meterological , GEMPAK, 2D/3D visualizations (IDV)
- Data access, real time data, internet data distribution, repository (RAMADDA), IDD/LDM, ADDE
- Java-based and built on VisAD library
- VisAD library: Jaca Component Library for interactive analyses and visualization of numerical data
- Based on creation of a data object with MathType
- Common Data Model à data access to set scientific feature types
- Data not directly consumed by the display
- Uses VisAD for on-the-fly coordinate transforms
- Jython language, auto generate derived variables
- Integration data from disparate data sources, automatically mapped to same projection
- Comprehensive user support, easy to install, out of the box data access and highly configurable
- Over 10,000 global users
impressed with capabilities, but any intention IDV on the web?
- Can run on the web via RAMADDA Silver, but limited in feature data
Is there a web client that talks to server?
- Access through browser
What about big data like high resolution satellite image?
- Going to deal with a lot more in the future with compression
- Right now, can subset and sample the data
EPA Office of Air Quality Planning and Standards
- website that turns data into information, connected to EPA’s air quality database
3 types plus interactive map:
- Raw data
- Visualization displays
- Broad audience including analysts, academia and general public
- Consolidate interfaces (with air explorer and air data site), leverage existing products, avoid copying data and accommodate emerging needs
- Selection menu to provide queries without acting on database
- How to update the content without refreshing page
- Using existing KML files for mapping without new map interface or content
- Uses jQuery
- Daily boxes of air contaminants, downloadable spreadsheet
- Displays survey source
Uses Google Earth API
- Can look at KMLs without going to Google
- Can display different contaminants, data info in the display
- Uses Google Earth API
data documentation – any information on data error and uncertainty?
- Basic information page
- Reports generated also give link with specific information on that data
Size of pulling files
- Pulling files that have data on whether or not have data for a particular year, not raw data, so smaller
When create reference file consolidating data from all stations?
- Not until submit button is clicked
- Online framework for analysis, visualization and access of Earth Science Data
- Goal of tedious data exploration to fast and fun online experience
- Giovanni-4: improve user experience, reduce time-to-market for new data/services, use off-shelf systems, separate analysis and visualization, improve usability
- Kepler model flow
- Services can be bookmarked, can tag and annotate, can re-run using other user’s tags
Omnibus portal with all datasets and two community specific portals
- Process of creating mobile app
- Issue of sanitizing data to standard format for use
- Off-the-shelf systems have their own cost
Search on level-2 swath data
- used open search providers for data
- echo, modis, laads, asdc
is Giovanni storing copies data in different format
- has cache or else pulls at run time
- Giovanni 3 kept a local copy in a different format
Physical oceanography daac (po.daac)
Generalized pipeline for creating and serving high-resolution satellite imagery
current system: state of the oceans
- based on google earth, past 30-days near real-time data, includes backend processing
- want to severe processing from backend to be part of web interface
- Tile creation
- Image creation
- Image repackaging
- Level II swath and level III data and makes grids floating tiles
- Rectangular projection (equirectangular)
- Organized with common taxonomy in hierarchy; must be geolocated
- Build or modify reader function for standardized data structure
- Merges floating point tiles into image based on space, time, scale, color
Can combine multiple products
- Modis aqua and terra sst tiles combined
- Convert to KML pyramids
- Efficient web access
3 cron jobs
- Check for new data
- Check for new tiles
- Check for new images
- 3 cron jobs
- Commingling l2 and l3 files à time issue
- Putting tiles into equirectangular grid issue for polar data
- Handling unique characteristics (striping)
- Re-projection/multiple re-sampling
Filling areas of missing data
- Don’t currently do
- Weighting tiles
- Move from KML to JPL using Tiled WMS
- Ingest imagery to PO.DAAC archive with metadata and provenance information
Downloading l2 data
- user-friendly interface to maximize data search time and minimize tool learning time
performance from client or server side?
- Everything happening on server side
- Conversion adds time
What happens when tings go wrong?
- Use 3rd party WMS server, no control over that, want to pull all data into our server but monitor to see if external server is down and get email
A lot of data analysis going on for color changes, etc. what if a bizarre image comes up?
- Have default color pallet now that doesn’t always work but hoping to provide options
- How does inexperienced user know they are getting the image that they requested? Not sure there is a good answer. Dependent on image servers to produce correct image or that if there is an error gives an error message rather than a half-baked image.
What are the mitigation strategies can use to see known errors and how affecting quality images
- This tool main purpose is to search and download granule
- Are displaying image as preview, not actual data. So not allowing user to download the jpg/pdf. Just the netCDF/HDF
- Manages expectations of the user by providing preview rather than scientifically valid result. Problem for Earth Observatory as well, with images being too large get error message BUT uses same error message for an actual problem with the image
Any outside user participation in design?
- Limited outside of NASA
- Need to go out and get sampling set of them, make sure doing what people need
How perform a subset?
- Show global map with predefined regions, draws bounding box OR can click pencil to develop it OR type in coordinates
- Only equirectangular projection right now
Two themes beyond tools: usability issues, and how to communicate science data. MODIS tool that shows quality flags, useful to science users. Has this group considered best practices for usability and for science communication? Universal design principals for these interfaces?
- Approaching visualization from framework perspective: how much need to show the user? For NEO erase it and give a picture, whereas other systems geared to science
- What makes a good data retrieval interface?
- Finding users: ESIP has general public users, the teachers, for testing and figuring out what will work
- Cannot underestimate complication of utilizing tools, teachers might struggle