Frameworks for Data Visualization

Abstract/Agenda: 

Many frameworks have been developed to support science data visualization. These frameworks can be integrated or modular and very often involve massaging the underlying source science data (via a set of specialized procedural steps) into a generalized derivative form (images, grids, tiles) facilitating the visualization process. Visualization frameworks might also simply be data browsers.

We want to identify examples of frameworks that exist within the Earth sciences community and encourage the developers, maintainers, and users of those systems to participate in this workshop at the summer meeting.

For this workshop we are soliciting 15-20 minute presentations that address the following core questions regarding visualization frameworks:

  • What are the challenges/functions that the framework is attempting to address?
  • What varieties of data products are represented within these frameworks? (e.g., considering data formats, types, and topics -- HDF, CSV, vector, point, remotely sensed, in-situ, etc.)
  • How can these frameworks be expanded/adapted to other science datasets? Have frameworks been developed with extensibility in mind or are there requirements that necessitate solutions that are unique to the source data?
  • What are the processing considerations when generating intermediate data products that then feed the frameworks?
  • How do the above considerations reflect the intended audience and infrastructure?

Demonstrations of existing frameworks are certainly welcome as long as the core questions are addressed as part of that demonstration.

Presenters

Kevin Ward, NASA Earth Observations
NASA Earth Observations (NEO): Data Imagery for Education and Visualization

NASA Earth Observations (NEO) is a framework that supports the storage and delivery of global imagery of NASA remote sensing data. NEO targets non-traditional data users, primarily in education and outreach: formal and informal educators, museum and science center personnel, professional communicators, and citizen scientists. These user communities have a need for these types of data but do not have the domain knowledge to locate the various source data and do not possess the tools and expertise required to produce imagery from those data. NEO strives to provide a middleware approach to meeting their needs.

NEO currently hosts imagery from more than 50 different datasets with daily, weekly, and/or monthly temporal resolutions. The imagery from these datasets is produced in coordination with several data partners who are affiliated either with the instrument science teams or with the respective data processing center.

NEO is a system of three components -- website, WMS (Web Mapping Service), and ftp archive -- which together are able to meet the wide-ranging needs of our users. Some of these needs include the ability to: view and manipulate imagery using the NEO website -- e.g., applying color palettes, resizing, exporting to a variety of formats including PNG, JPEG, KMZ (Google Earth), GeoTIFF; access the NEO collection via a standards-based API (WMS); and create customized exports for users (delivered via ftp) such as Science on a Sphere, NASA’s Earth Observatory, and others.

 

Yuan Ho, UCAR
Visualization of Geo-science data with Unidata's Integrated Data Viewer

The Integrated Data Viewer (IDV) from Unidata is a Java(TM)-based software framework that provides new and innovative ways of displaying and analyzing Earth science data, as well as common 2D, 3D, and 4D visualization capabilities. Many features designed in the IDV can be used to view the daily weather as a weather enthusiast,  to analyze the model output in the climate research, to study satellite observations, to navigate the deep ocean environment, or to explore the complex three-dimensional data in geophysics. The IDV software library can be easily be used and extended to create custom geoscience applications beyond the atmospheric science realm. The IDV community has expanded from the traditional synoptic meteorology community to include many new disciplines (e.g., climatology,  hydrology, oceanography, geophysics, etc.).  In this presentation we will present the software architecture and several user examples of these communities for their academic usage and scientific discovery.

 

David Mintz, EPA
AirData – some design considerations

The challenges in designing a data delivery system can be overwhelming. How do you design a system that is efficient, sustainable, agile, and doesn’t break your budget? Recently, EPA updated its AirData website to meet several initiatives – to consolidate interfaces, leverage existing products, avoid copying and restoring data, and accommodate future needs. This presentation will demonstrate a few visualization tools, focusing on system design and user interface considerations. Topics include how and when to connect directly to source data and the concept of using of cron jobs to generate intermediate products.

 

Charles Thompson, Physical Oceanography Distributed Active Archive Center (PO.DAAC), Jet Propulsion Laboratory
A Generalized Pipeline for Creating and Serving High-Resolution Satellite Imagery

Generating imagery for a wide range of science datasets via a standardized set of processes is challenging.  Inevitably, there is always some number of unique, specialized steps in transforming diverse satellite data products in formats such as HDF or NetCDF into high-quality imagery suitable for outreach, publications, or the general public.  Furthermore, as the spatial resolution of remotely sensed data increases, the corresponding imagery becomes more unwieldy to efficiently serve.  This presentation will discuss the evolving framework behind State of the Oceans (SOTO), a Google Earth based web interface which visualizes near real time oceanographic parameters such as sea surface temperature and ocean winds.  The SOTO backend employs a configurable image generation pipeline soon to be combined with a Tiled WMS server to create a flexible end-to-end imaging system applicable to a wide array of science data products.

 

Mike McCann, Monterey Bay Aquarium Research Institute
A Framework for in situ oceanographic measurement data access and visualization: The Spatial Temporal Oceanographic Query System (STOQS)

The CF-NetCDF file format and the technologies and conventions surrounding it (THREDDS, Climate Forecast Conventions for Point Observations, etc.) provide a solid foundation for managing archives of diverse collections of data.  This technology stack is used successfully for numerical model output and is beginning to be used more widely for in situ measurement data.

Access efficiency decreases with decreasing dimension of NetCDF data. For example, the Trajectory Common Data Model feature type has only one coordinate dimension, usually Time – positions of the trajectory (Depth, Latitude, Longitude) are stored as non-indexed record variables within the NetCDF file. If client software needs to access data between two depth values or from a bounded area, the whole data set must be read and the selection made by the client software.  This is very inefficient. What is needed is a complement to NetCDF that provides server-side indexing of any variable related to variables of interest.

Geospatial relational database technology provides this capability. The Spatial Temporal Oceanographic Query System (STOQS) has been designed and built to provide efficient access to in situ oceanographic measurement data across any dimension. STOQS is an open source software project built upon a framework of free and open source software for geospatial data. STOQS complements CF-NetCDF and OPeNDAP by providing an ability to index data retrieval across parameter and spatial dimensions in addition to the a priori indexed coordinate dimensions of CF-NetCDF. It also provides a functional bridge to standards-based GIS technologies.

For more information please see http://code.google.com/p/stoqs/

 

Mahabal Hegde, NASA
GIOVANNI - The Bridge Between Data and Science

Giovanni is a NASA framework for online visualization and analysis of Earth science data. It is intended to abstract the complications of data formats, structures, quality flags, etc. from the user into a Web GUI-driven user interface that is easy to use.  The goal is to transform data exploration from a process that can take days, weeks or months into one that takes only minutes. The current architecture of Giovanni, version 3 (or G3) is being updated to be more flexible and agile with respect to adding datasets and features.

The next generation Giovanni, version 4 (or G4), leverages off-the-shelf components to reduce implementation cost, reduce time-to-market, improve reliability and provide better and wider end-user functionality. Examples of such re-use include selection of Kepler as the workflow engine, OPeNDAP for remote data access/subsetting, YUI for user interfaces, JCS for data caching, NCO tools for data processing and JavaScript based frameworks for client side visualization. The talk will discuss challenges faced along the way and contrast theory and reality in stitching together third party components to provide an optimal solution.

 

Richard Kim, Physical Oceanography Distributed Active Archive Center (PO.DAAC), Jet Propulsion Laboratory
HITIDE: An Extensible Service-Based Web Interface for the Search, Imaging, and Extraction of Swath-based Geophysical Parameters

PO.DAAC's HITIDE is a web application powered by a trio of web services that facilitate access to and evaluation of swath satellite data products in common scientific file formats such as HDF and NetCDF.  The key component to its capabilities is a tile database which allows for precise search, image, and extract functionality based upon spatial and temporal constraints at a sub-granular resolution.  With this functionality, users in the science community can efficiently find, evaluate, and download subsetted science data products pertaining to their research. As a system, HITIDE represents a framework which can ingest virtually any geolocated data product accessible via Opendap.  This presentation will provide an overview of HITIDE, the considerations when adding a new data product, plus the future direction of development including support of gridded data products, enabling data-specific constraints, and enhanced dynamic imaging capabilities.

 

Notes: 

Frameworks for Data Visualization Notes

TALK I.

Kevin Ward

What is a framework? A way to visualize data

 

NASA Earth Observations (NEO): a repository of imagery of NASA data

  • how can NASA simplify access to data imagery (not raw data)
  • Audience:
    • Science centers, formal/informal education, public, citizen scientists
    • Wanted: global imagery and data, high resolution, standard image formats
  • Current barriers between a satellite data source and the user
    • Tool and domain knowledge, data formats, processing
    • NASA wanted to provide expertise/middle-ware/expertise to leap these barriers for users == NEO
  • Decision-process to meet 3 criteria
    • Global imagery…which datasets?
      • Accessible topics (core concepts and user-suggested)
      • Accessible data (already online or NASA data expertise able to facilitate)
    • High resolution
      • Source datasets’ resolutions, real-time processing constraints on fixed budget, users needing 2048x1024
  • Based on this for the base imagery chose:
    • 3600x1800, data only (no outlines or masks), grayscale (netpbm), jpg/png/geoTIFF
  • Also included:
    • Approaching processing from visualization perspective
      • Data that scientist might filter out, but necessary to make good visualization (e.g. if filter out quality for NDVI cut out a lot of pixels, like cloud cover in Amazon)
    • Provide common temporal compositions
      • Daily, weekly, monthly compositions that many datasets already come in, but some not
      • Good for comparing datasets in the same timeframe
  • Accessing NEO
    • Website, WMS (web mapping service), FTP
  • Used by target groups, also Weather Channel, Science on a Sphere
  • Curation standpoint of NASA: interesting process for how to get and represent the data

Website: http://neo.sci.gsfc.nasa.gov/

COMMENTS:

  • any representation of the decision-process for how to represent the data?
    • No, needs to be better. Important for Users to know these decisions especially for data processing. Maybe something the group can work on.
  • ​ Elaborate on interaction Weather Channel and NASA
    • Usually an email, starting to grab images on their own though and do visualization processing similar to images for Earth Observatory.

 

TALK II.

Yuan Ho

Unidata Program Center

 

Integrated Data Viewer (IDV)

UNIDATA

  • geoscience at the speed of thought
  • community has expanded beyond academic arena to government and even private entities
  • provides:
    • Visualization
      • Meterological , GEMPAK, 2D/3D visualizations (IDV)
    • Data Access
      • Data access, real time data, internet data distribution, repository (RAMADDA), IDD/LDM, ADDE
  • IDV
    • Java-based and built on VisAD library
    • VisAD library: Jaca Component Library for interactive analyses and visualization of numerical data
    • Based on creation of a data object with MathType
    • Common Data Model à data access to set scientific feature types
    • Data not directly consumed by the display
    • Uses VisAD for on-the-fly coordinate transforms
    • Jython language, auto generate derived variables
    • Integration data from disparate data sources, automatically mapped to same projection
    • Comprehensive user support, easy to install, out of the box data access and highly configurable
    • Over 10,000 global users

COMMENTS:

  • impressed with capabilities, but any intention IDV on the web?
    • Can run on the web via RAMADDA Silver, but limited in feature data
  • Is there a web client that talks to server?
    • Access through browser
    • API
  • What about big data like high resolution satellite image?
    • Going to deal with a lot more in the future with compression
    • Right now, can subset and sample the data

 

TALK III.

David Mintz

EPA Office of Air Quality Planning and Standards

 

AirData

  • website that turns data into information, connected to EPA’s air quality database
  • 3 types plus interactive map:
    • Reports
    • Raw data
    • Visualization displays
  • Broad audience including analysts, academia and general public
  • Redesigned 2011
    • Consolidate interfaces (with air explorer and air data site), leverage existing products, avoid copying data and accommodate emerging needs
  • Design considerations:
    • Selection menu to provide queries without acting on database
    • How to update the content without refreshing page
    • Using existing KML files for mapping without new map interface or content
  • Tile Plot
    • Uses jQuery
    • Daily boxes of air contaminants, downloadable spreadsheet
    • Displays survey source
  • Interactive Map
    • Uses Google Earth API
      • Can look at KMLs without going to Google
    • Can display different contaminants, data info in the display

COMMENTS:

  • data documentation – any information on data error and uncertainty?
    • Basic information page
    • Reports generated also give link with specific information on that data
  • Size of pulling files
    • Pulling files that have data on whether or not have data for a particular year, not raw data, so smaller
  • When create reference file consolidating data from all stations?
    • Not until submit button is clicked

 

TALK IV.

 

TALK V.

Mahabel Hegde

NASA GODDARD

 

Giovanni

  • Online framework for analysis, visualization and access of Earth Science Data
  • Goal of tedious data exploration to fast and fun online experience
  • Giovanni-4: improve user experience, reduce time-to-market for new data/services, use off-shelf systems, separate analysis and visualization, improve usability
  • Kepler model flow
  • Services can be bookmarked, can tag and annotate, can re-run using other user’s tags
  • Omnibus portal with all datasets and two community specific portals
    • Aerostat
    • MAPSS
  • Process of creating mobile app
  • DEMO
  • Issue of sanitizing data to standard format for use
  • Off-the-shelf systems have their own cost

COMMENTS:

  • Search on level-2 swath data
    • used open search providers for data
    • echo, modis, laads, asdc
  • is Giovanni storing copies data in different format
    • has cache or else pulls at run time
    • Giovanni 3 kept a local copy in a different format

 

TALK VI.

Charles Thompson

Physical oceanography daac (po.daac)

 

Generalized pipeline for creating and serving high-resolution satellite imagery

  • current system: state of the oceans
    • based on google earth, past 30-days near real-time data, includes backend processing
    • want to severe processing from backend to be part of web interface
  • Current steps:
    • Tile creation
    • Image creation
    • Image repackaging
  • Tile creation:
    • Level II swath and level III data and makes grids floating tiles
    • Rectangular projection (equirectangular)
    • Organized with common taxonomy in hierarchy; must be geolocated
    • Build or modify reader function for standardized data structure
  • Image creation
    • Merges floating point tiles into image based on space, time, scale, color
    • Can combine multiple products
      • Modis aqua and terra sst tiles combined
  • Image repackaging
    • Convert to KML pyramids
    • Efficient web access
  • System control
    • 3 cron jobs
      • Check for new data
      • Check for new tiles
      • Check for new images
    • Hourly
  • Issues
    • Commingling l2 and l3 files à time issue
    • Putting tiles into equirectangular grid issue for polar data
    • Handling unique characteristics (striping)
    • Re-projection/multiple re-sampling
    • Filling areas of missing data
      • Don’t currently do
    • Advanced averaging
      • Weighting tiles
  • Coming soon
    • Move from KML to JPL using Tiled WMS
    • Ingest imagery to PO.DAAC archive with metadata and provenance information

 

TALK VII.

Richard Kim

NASA JPL

 

HITIDE

 

Downloading l2 data

  • user-friendly interface to maximize data search time and minimize tool learning time

 

COMMENTS:

  • performance from client or server side?
    • Everything happening on server side
    • Conversion adds time
  • What happens when tings go wrong?
    • Use 3rd party WMS server, no control over that, want to pull all data into our server but monitor to see if external server is down and get email
  • A lot of data analysis going on for color changes, etc. what if a bizarre image comes up?
    • Have default color pallet now that doesn’t always work but hoping to provide options
  • How does inexperienced user know they are getting the image that they requested? Not sure there is a good answer. Dependent on image servers to produce correct image or that if there is an error gives an error message rather than a half-baked image.
  • What are the mitigation strategies can use to see known errors and how affecting quality images
    • This tool main purpose is to search and download granule
    • Are displaying image as preview, not actual data. So not allowing user to download the jpg/pdf. Just the netCDF/HDF
  • Manages expectations of the user by providing preview rather than scientifically valid result. Problem for Earth Observatory as well, with images being too large get error message BUT uses same error message for an actual problem with the image
  • Any outside user participation in design?
    • Limited outside of NASA
    • Need to go out and get sampling set of them, make sure doing what people need
  • How perform a subset?
    • Show global map with predefined regions, draws bounding box OR can click pencil to develop it OR type in coordinates
    • Only equirectangular projection right now
  • Two themes beyond tools: usability issues, and how to communicate science data. MODIS tool that shows quality flags, useful to science users. Has this group considered best practices for usability and for science communication? Universal design principals for these interfaces?
    • Approaching visualization from framework perspective: how much need to show the user? For NEO erase it and give a picture, whereas other systems geared to science
    • What makes a good data retrieval interface?
  • Finding users: ESIP has general public users, the teachers, for testing and figuring out what will work
  • Cannot underestimate complication of utilizing tools, teachers might struggle
Identifier: 
doi:10.7269/P3TD9V7J
Citation:
Ward, K.; Thompson, C.; Frameworks for Data Visualization; Summer Meeting 2012. ESIP Commons , June 2012