Python and ArcGIS Tools for HDF

Abstract/Agenda: 

Python is emerging as a language of choice for data analysis and visualization because of a vibrant community of developers collaborating to provide easy-to-use packages for many common scientific programming needs. The h5py package provides a Pythonic interface that lets you access data from HDF and easily manipulate those data using NumPy. For example, you can slice into multi-terabyte datasets stored on disk, as if they were real NumPy arrays. This session will start with an introduction to HDF and Python by Andrew Collette, the author of the recently published O’Reilly book Python and HDF. In addition to Python, more and more people are integrating data in HDF into GIS analysis and mapping tools. Esri has been participating in this effort with new tools for HDF and other scientific data support. Noman will report on recent developments there.

Notes: 

The HDF Group Part 3 of 4

HDF, Python, and GIS

·          

 

Andrew Collette (UC-Boulder) – Python and HDF5

·         Works with dust experiments – have sent equipment to the moon – trying to get the data online for users to download

·         UCLA Large Plasma Device – create plasma – records once a second – at end of run gigantic volumetric time series (3 dimension, +time, +dataset) – ad hoc methods would not work

o   Raw data goes into HDF5 & lab has some tools to analyze data

·         IDL – analysis environment written in the 1980s – popular in plasma and astrophysics – has the aspects you need

·         Python – is a good platform in the same way IDL is for science

·         Wild west days of python are over – there is a minimal stack to do science in python

o   Numpy, SciPy, matplotlib, and IPython (IP[y])

o   NumPy = how to create an arra

o   SciPy = analysis

o   Matplotlib = 2D plotting – contours, histograms

o   Ipython – advanced shell (with tab completion, interactive help, interactive notebooks)

o   This is the basic need to work in python

o   Reason to keep with it – over 45,000 package – because of community

o   It is a useful glue language

·         HDF5

o   Model of HDF5 fits into the model of python

o   2 python access package – PyTables (fast database built on hdf5) & h5py (general purpose HDF5 library – map closely to python concepts)

o   HDF5 is a simple format  - only need to worry about

§  1) datasets (arrays on disk) – read files in standard format from other and use it as a data store (ie can use when it doesn’t fit in memory)

§  2) Groups: filesystem-like folders – natural way to organize it – groups work like dictionaries

§  3) attributes: key-value metadata – in same file as raw data include metadata to make sense of it

·         Demo – python 2.7

o   Import H5py – not there – install – “pip install h5py”

o   Pip = python installation manager = allows you to install packages

o   Start ipython (an enhanced interpreter

o   Import h5py

o   h5py.file? to get information

o   import numpy as np

o   a = np.ones((100,100), dtype=”i”)

o   b = {‘one’:43, ‘two’: list()} – concept of a group are identified by group names

o   create

·         hdf5 file

o   f – h5py.File(‘foo.hdf5’, ‘w’)

o   f.keys() – has keys attribute

o   f.[‘one’] = a  - now create an HDF5 file

o   can easily get data in and out of HDF5 (use ‘out’)

o   dset.shape

o   dset.compression (there is none right now)

o   dset2 = f.create.dataset? (get syntax) – can add keyword agreement (can access HDF5 attributes, exp compressed)

o   f.visit  calls something from the file

·         www.h5py.org  & bookPython and HDF5

·         Q how do you do auto compression?

o   Can choose any number of filers in HDF5, filters for compression … turn on flag in HDF5

o   Happens at a lower level in HDF5 library

 

Nawajish Norman (ESRI) – Advancing Scientific Data Support in ArcGIS

·         ArcGIS Platform – no longer a desktop – it is a platform

·         Scientific Data – stored in NetCDF, GRIB, and HDF formats

o   Multidimensional

o   Need to find a better way to support this data

o   Ingest, manage, visualize, analysis, and share this data

·         Ingesting scientific data

o   Directly read netCDF – make raster, feature layer, and tables

o   Directly read HDF and GRIB data as raster

·         What about aggregation?

o   Wanted to create a multi-dimensional seamless cube from different files (region or time steps/slices)

o   Wanted to use mosaic – will be released in arc10.3

§  Spatial, temporal aggregation and on-the-fly analaysis

§  Accessible as map service and image service

§  Supports direct ingest

§  Eliminates conversion and processing

o   Multidimensional mosaic datasets

§  Define NetCDF, HDF and GRIB as rasters – choose 2D raster variables and dimension values

·         Using scientific data

o   Same as any other layer

§  Display, graph, animate, analysis tool

o   In mosaic – visualize temporal change of a variable, at vertical dimension, flow direction and magnitude variables

o   New vector field renderer for raster – supports U-V and magnitude-direction, dynamic thinning, on-the-fly vector calculation

·         Spatial and temporal analysis

o   Loop and iteration in model builder and python -  don’t need out of the box tools – kind of building block

o   With Mosaic get more (RFT = raster function template)

§  A scientific model

§  Windchill = …. A+ bT - … - so loading analyzed data – no creation of new data – created on the fly

§   

·         WMS support for multi-dimensional data

o   Sharing means = sharing a map, image services (provide gridded data), geoprocessing service (ex. analytical capability on the web)

o   Can change the dimension of the web service

o   GetCapabilities – support time, elevation, and other dimension (e.g. depth)

o   getMap – returns map for any dimension value – supports CURRENT time

·         WMS in Dapple Earth Explorer – but want multidimensional – created web service

·         ArcGIS Online

o   Provide curated, authorities content – can host or use arcGIS

o   Powerful way to disseminated data and information

·         Ready-to-use analysis services

o   Working with data that they server – ex. Watershed

o   GLDAS Noah Land surface model outputs (live after UC)

·         Web application  - use image service can great graphical information

·         Create space-time cube & emerging Hot spot analysis – use scientific data format for their analysis tools

·         OPeNDAP to NetCDF

o   Wrote tool that made a new file – never went to product

o   Now – make OPeNDAP layer – ingest OPeNDAP services – support sub-setting

o   Not in 10.3, but will provide the code

·         Create Story Maps

o   Combination of interactive map, video, & text

·         Q  = widely use the term NetCDF – but there are 3 or 4 flavors (4 enhance, 4 classic, or 3 classic), need to start qualifying things on a menu

o   When started using NetCDF and CF, now converting to NetCDF4

·         Q = eventually also ingest OPeNDAP services – end could be an NCL file

o   Currently not NCL file – use C not Java library

o   If use OPeNDAP then able to read NCL – then able to support more types

 

Ted’s comments & questions

·         Things are starting to come together

·         For HDF using gdal

·         Like the connection of HDF in ESIP

Citation:
Habermann, T.; Python and ArcGIS Tools for HDF; Summer Meeting 2014. ESIP Commons , May 2014