HDF Product Designer

Abstract/Agenda: 

  Interoperable data have been a long-time goal in many scientific communities. The recent growth in analysis, visualization, and mash-up applications that expect data stored in a standardized manner has brought the interoperability issue to the fore. On the other hand, producing interoperable data is often regarded as a sideline task in a typical research team for which resources are not readily available. The HDF Group is developing a software tool aimed at lessening the burden of creating data in standards-compliant, interoperable HDF5 files. The tool, named HDF Product Designer, lowers the threshold needed to design such files by providing a user interface that combines the rich HDF5 feature set with applicable metadata conventions. Users can quickly devise new HDF5 files while at the same time seamlessly incorporating the latest best practices and conventions from their community. The HDF Product Designer can generate interoperable data in HDF5 files from the onset of their production. The tool also incorporates collaborative features, allowing team approach in the file design, as well as easy transfer of best practices as they are being developed. The current state of the tool and the plans for future development will be presented. Constructive input from session participants is always welcome.

Notes: 

HDF Product Designer: Interoperability in the First Mile - J. Joe Lee, Aleksander Jelenak, Ted Habermann

 

·         Presenting tool to get feedback

·         Data life cycle – first and last miles = question, data collection, processing, distribution, archive, discovery, analysis (repurposing)

o   Different people do different parts of it

o   Once it goes to an archive the researchers pass the data off

o   Expertise decreases through the life cycle but the number of users goes up

o   Standardization and conventions does on at the archive.  Can you push them upstream to people who are creating data

o   Need to think  about trying t the longer term “or climate like” use of data

·         Try to facilitate collaborative design of interoperable and standards-compliant data products in HDF5 as early as possible in the mission development process… the first mile

·         Collaboration – Individuals – Teams – Projects (Missions) – Programs – want to support design and documentation at many different levels

·         HDF5 Product Design Architecture

·         Conventions – NetCDF user guide (NUG), ACDD, OCDD (object), Climate and Forecast (CF), HDF-EOS

o   Implementation at different levels – groups, variables

o   Make sure that when people create the designs they start as conventional

 

 

HDF Product Designer

·         Product Designer has 3 essential component

o   Desktop, Server, Online

·         Designed Interoperable HDF5 data products

·         Client is itself interoperable in wxpython (works on Mac, Linux, …)

·         When you create a new project – ex. SMAP mission – ask if want to use convention (HDF-EOS, CF, ACDD – add NUG and OCDD).

·         Creating product from scratch is painful – there are lots of great products with conventions – allowed to import existing products – ex. HDF5 JSON, DF4 M Map, NcML – possible MS Excel, Text (Csv), and DB … primary target is the top 3

·         Key feature is very easily – similar to windows explorer or mac finder -  support drag and drop – so any component can be reorganized easily

·         HPD – Share – share with other using a google design – need authorization/registration – use NASA URS (user registration system)

o   If log in – works like facebook to allow specific applications – HDF Product Designer

o   Once you log in – you can select project

o   Stored in database that Aleksandar created

o   If user of different domains – can think of other domains

o   If open one project – see several files – files = design – create view hdf5 file and generate code

o   Alek created several sample files – he created them and shared.  He used versioning system – if some changes the file, you can go back

·         Online Publishing – don’t have to wait until last minute to share your data

o   Product design allow you to create it on the cloud as soon as you design it

o   Use amazon web services – publish from the design

o   Why is online important – if pub online then can use other web services – validate online immediately

o   Goal – moment you publish – you can use a visualize tool – then can see if it works with visualize tools (ex. Matlab)

·         Design (won the vote…tried online – failed… raised hands)

o   Can’t do much right now – Help or New

o   Create new SMAP – doesn’t select convention

§  Inserts root group

o   Can verify by sending it to desktop – opens in HDFView – shows hidden/internal file

§  If want to check against ACDD – get 3 out of 36 – only had ID

§  Then add attribute – naming authority – string type, “naming_authority” JPL

§  Publish it again – now score is up to 4/36 – because added naming_authority

§  File is served by THREDDS – then using rubric capability in THREDDS

§  Doesn’t want to do this for all attributes that each convention require

o   When create file add convention (boom)

§  L3a – ACDD – then have score of 43/46

o   Q adopted convention as ACDD – specified convention – field have null – project usually has standard attributes for project (NcML file)

o   If want to reuse old design – can import files – loads only metadata

§  Validate with CF checker

o   Q (Walt) – L3 has weirdness – lat/long were not linked to 2-D arrays of the data – what happens when someone wants ascii – get inverted image – original image (way it was fixed in OPenDAP screwed it up… represent not just attributed but object – get association all the way through with parameters is important piece.  In the development of the project – dimension is not the same as the main parameters

§  Explicitly define dimension as a top level menu (so people don’t forget it)

·         Ultimately – want this to happen by default so people can’t forget this

o   Trying to implement these things as event handlers for conventions – so the tool know what needs to be done to be a valid variable (Ted)

o   Q – is there a standard way to apply groups – CF convention is to use full definition

§  or use reference table – be careful outside of NetCDF world because it doesn’t expose references

§  both Joe and Alek are involved in the discussion – as the convention evolve – they are part of the discussion and will make sure you support what comes out.

o   Walt – have existing HDF 4 datasets and would like to redesign (hint to developers).  They describe all the attributes in HDF4 – but they are buried in value text where it is comma delimitated – they have an implied way – they list everything – have to pull all out yourself... workflow in look at old data sets – look at doing this manually – csv – need to parse the name and then the value to see what applies to what – how fit workflow other than manual – Answer – write a parse to write the conversion or write NcML

o   Walt – how do we populate the template / empty array (need easy mechanism)

§  Use commands to open file and then pull it in/ open the dataset

§  Now something like 3 commands to pull the data

§  C++ and FORTRAN are used by project/missions

o   Export HDF4 file via NcML –Export HDF4 file via NcML – see changes propagate to OPeNDAP

·         Q 3 types of files you can import right now – representations of files – have HDF4 – can dump as NcML –

o   Show how to export from HDF4 on a THREDDS server

o   H4Map interface – generates XML – which can be imported into the designer

o   “just push the button and your new file is ready”

·         Are there a way to do this to a number of files

o   This is product design – tool to help people not create messed up files – doesn’t help Anna, but will help in 25 years.

o   So Anna wants a batch tool

·         Is there a way to convert from NetCDF3

o   Create NcML – and then it will read

·         Ted – throwing wet blanket – working on design not file system extent… still need to grab the data

o   Working on virtual data sets – header file external data sets – with urls

o   HDF metadata becomes like a THREDDS catalogue

·         Invest more on product design level and design more carefully – then won’t have to go backwards

·         Goal of CF is to be compliant with a number of different formats - now trying to get CF to start using groups (CF2.0).  We all use groups – need to use groups – will converge. Need groups in metadata

·         Did you demo the publish function – yes – able to check how things look while you work – is it good, does it make sense – can I open in X.  If it works then it will work with real data – work on cloud

·         3 months more before beta

·         Walt – when you have to parse out what is known implicitly – that isn’t so easy for this – Ted – can now get HTML and transform with XTSL … don’t actually look at metadata in HDF  conventions are metadata – support multiple conventions – different use cases require different conventions – discovery, use, understanding

·         Tomorrow talk about how to translate ISO into NcML – and then transform it to get an entire ISO metadata record into this file – interested in designing products with ISO and then instantiating and then plugging the data in

o   Like when OPeNDAP was develop and added to NetCDF library and then OPeNDAP just worked…. Things will happen over the network and things will just happen.

·         Walt – idea of merging datasets between DAAP  will become much easier to do – currently make arrangements to send MLS data as a subscription and then combine and merge to create a huge project – essentially duplicate data – now able to just reference file – could even pull from more than 1 DAAC – generate science project on demand

Actions: 

Are yo interested in becoming a beta tester of HDF and give us feedbac? Please contact us at [email protected] 

Citation:
Lee, H.; Jelenak, A.; HDF Product Designer; Winter Meeting 2015. ESIP Commons , December 2014