Recent Developments in ISOLand

Abstract/Agenda: 

There have been a number of recent applications, developments and changes in ISO Standards that are relevant to ESIP.  These include implementations of granule metadata production tools by SMAP, ISO lineage implementations for AMSR-E and several changes to standards: the revision of 19115 and support for xml implementations of that revison, the new data quality implementation (19157), and the revision of 19115-2 (acquisition and instruments) which is coming up in the near future.

Notes: 

 

Session: News from ISOLand

 

IsoLineage Metadata at AMSR-E SIPS – Helen Conover – GHRC DAAC/AMSR-E SIPS, University of Alabama in Huntsville

·         Terms for talk

o   dataset (ISO) = date file (individual science data file)

o   Product = series (ISO) (collection of data files

·         AMSR-E (Advanced Microwave Scanning Radiometer – Earth Observing System

·         SIPS (Science Investigator Lead operating system)

·         GHRC – does provenance (how did you get this, where did it come from, how can it be used – used to be called processing history) and add metadata and QC to data

·         Products = brightness temperatures, ocean products, monthly and daily ocean grids, sea ice concentration, snow depth, sea ice drift (typical NASA microwave suite)

·         Capture the contextual knowledge

o   Some is already there

o   Recently – putting metadata into ISO lineage metadata model

§  Lineage so it can be added to full suite of ISO data

·         Legacy data system (HDF-EOS2)

·         Capture – which data products go into which for the different data projects (ex. Rain has rain and brightness temperatures)

o   SIPS provides control script – does not include science

·         ISO is complex (comprehensive) – need to make friends in community

o   Only look at lineage

·         Lineage Model

o   Lineage – descript source, and processing (which does down to algorithm)

o   LE = 19115-2 – they are an extension of the original model (LI) – to facilitate more detailed description of lineage

o   DQ_DataQuality à LI_Lineage (quality of this product – i.e. what went into making it)

·         XML and ISO are verbose way to ‘saying things’ – intending to attach it to the data file (increase size)

o   Ended up with HDF SE attribute (with HDF-EOS) – this is an HDF4

·         Lineage Granularity

o   Lineage info is the same across all product – capture info for a unique file (when and where it was processed)

o   At product or series level – capture attribute information

o   Keep all lineage data in each file (2 elements in each file)

o   XML “dataset” and “series”

·         Lineage Model

o   Where put information – first big job (then how to say)

o   ProcessStep – high level processing description

o   Algorithm – science algorithm name, version, author, description (high level info and pointers to real data)

o   Delivered algorithm package might change but not always change the science algorithm

o   DOI and specific descriptions in Source files

o   These are done once per version of data

o   Then automated process for each data file for processing date/time/ location/ input and output files

·         Question – Echo data (cloud cover) – easy map from ECHO to ISO

o   ECHO attribute or PSA – is that mapped in LE_Algorithm?

o   Value would be somewhere else

o   Ted – can have any number of processing steps (0..*) – can have separate ProcessStep or Algorithm

·         Q – what level of granularity

o   Tried to capture the science algorithms

o   Ex. Sea ice – one processing executable, gridding, and 2 algorithm, and then snow depth

·         Q – each file has many attributes

o   In provenance system – capture the attributes – map the variables to each algorithm

o   Not to the level of equation names (some have actual names and others are descriptions)

o   Did not do mapping of variable to algorithm in ISO

·         Versioning

o   ESDT – doesn’t change often

o   DAP (Delivered Algorithm Package)

o   Trying to tie the processing algorithm version to the metadata

§  Includes what algorithm does, description, and author info

·         DOIs

o   NASA trying to figure out how to handle DOI at ISO level

o   NASS ES difference between GCMD DIF and netCDF CF

o   Decided to combine url and DOI and then text associate is the “doi”

·         Q – can use anything not use DOI

o   Yes – hence put DOI (but description is not part of Identifier)

·         Use codeSpace to indicate NASA ESDIS as publisher

o   What this to be part of the NASA flavor of ISO

o   They are the authority for DOI

·         Challenges – complicated, it is evolving, schemas have not been promptly provided, need to reach community consensus

·         Need – NASA flavored schema, concrete examples, representations in other languages, communication (?online forum)

·         Q (Aleksandar) – where get processing lineage before put into ISO

o   Red on screen = online form, talk to producer, fill in form and store in database

o   Blue = processed in house (file read events, software evocation) – parsed into database and then XML

·         Q (Jennifer) – do you have a cheat sheet of summary

o   Lots info the NOAA GEO-IDE wiki

o   A lot of details – emailed works – then sorted through

o   Need online resource (don’t have fully validated XML) – will be there in a month

o   NASA is also developing their own wiki

 

A Practical Application Using ISO Metadata – Incorporating ISO Metadata into SMAP Data Products – Barry Weiss Hook Hua, Vance Haemmerle (JPL)

·         SMAP – first NASA decadal mission

·         Soil moisture

·         15 products L1 to L4 (parsed radar telemetry to carbon net ecosystem exchange)

o   Trying to create ISO for different data products (large undertaking)

·         Level 1 requirement for ISO metadata (required from the top)

o   Using ISO because it is international, common representative (contextual model and encoding of it)

o   Include tools, use cases

·         ISO basic concepts

o   Granule metadata = dataset

o   Collection metadata = series metadata

o   Codelist = enumerated list of accepted values

o   Profile = community agreement of particular elements

o   Extension = explicit modification (NASA is a flavor)

o   EX = Extent, LE = lineage, CI = citation

·         ISO geographic standards

o   Using ISO 19130 – imagery sensor model

o   Usually talking about 19139 encoding

·         UML

o   Progress Code – completed, historic obsoleted

o   SMAP needs extension points – need to mark products as “beta, stage 1, stage 4” – what to augment in code list

o   Do we need to standardize code list

§  2 camps (1) 19139 XML (2) HDF5 group

§  Kept both

·         From the Earth Science Data Model (ESDM) – all in HDF5 metadata group

o   Create crosswalk between HDF5 and ISO groups

o   Ex. Lineage would be subgroup in HDF5

o   Lineage group include attitude, ephemeris, antenna pointing, …

o   Renamed MI_Identifier as identified_product_doi for DOI

·         Started with UML diagrams from ISO and expanded where needed for SMAP

o   What were the extensions that were needed

o   Then generated spreadsheet – provided mapping between ISO to HDF to ESDM

o   ESDM defines gaps, ISO only beginning and ending

o   Ex. Extent – needed to add a vertical extent

·         Spreadsheet provided more than exercise – did the mapping programmatically

o   Mapping used by converter to generate extra files for crosswalk

·         For series metadata - Delivered by data architect

·         For dataset – problem = automation

o   Spreadsheets are the first step

o   Use info to automatically inject the correct fields into the ISO from HDF

o   Able to reduce dependencies to only HDF5 libraries – simplified things

·         Saxon used for crosswalk (transforms needed for each flavor)

o   Decoupling science software and metadata dialect

o   That means if the dialect changes

·         Q (Peter) - Has program to move from HDF to ISO

o   Not writing software – writing rules (saxon)

o   XSL – it is an open source tool (apache product)

o   Kept as simple as possible

o   Ted – also have transform from OpenDAP land to ISO and NC-ISO – translates from one XML to another

·         Q - (Peter) – rules defines the fields?  - yes

·         Q – what about binary data

o   HDF group – h5dump – ignores all data arrays (only dumps out metadata)

·         DOI and UID

o   MD_Identifier has been updated to be a formal class (changed) – identify if DOI

·         SMAP extension

o   Additional attributes (ex. Run time parameters)

§  Eos and echo additional attributes

§  Issue – couples the type and the values – need to repeat type definition throughout (sometimes doesn’t want to repeat)

o   Only one citation for algorithm

·         Validation

o   XML data binding tools

o   Ted suggested schematron approach – use rules (popular in ISO community)

§  Ex. Width needs to be followed by high

·         Limitations in HD5 (1.8 library)

o   In hierarchy – group names have to be unique (can’t represent arrays of groups)

§  But arrays are common in ISO

o   H5dump – UDT (user defined data types)

§  Not fully supported – become text blogs

·         NASA flavor recommendations

o   Acquisition information – some belong to granule and some series

o   Namespaces

·         ISO is cutting edge… to NASA

·         Lessons

o   Easing into ISO

o   ISO deeply nested

o   Simplicity – ex. Only HDF5 (easier to use Matlab)

o   Need flavor

·         Q – how benefiting from international flavor

o   19000 series – cover geographic but not mission specific

o   Flavor is a community agreement (not changing standard) – use same extension – these options

·         Q (Erin) – how different than a profile

o   Call it what you want (NASA likes flavor)

o   Flavor – is a code list (Erin)

·         Instrument, platform, processing – ISO revised every 5 years – implement now as extension and then add to discussion for future (community process of extensions)

·         Q – will the standards have evolved in time for SMAP mission (currently using previous version)

o   Ted – ISO has standard mechanism to extend itself

·         Q (Alek) – how much larger

o   It is in the noise (10-70 k) –

o   Helen had 100 k files that only had 10k spot

 

Wikis, Rubrics, Views and Connections: An Integrated Approach to Improving Documentation – Ted Habermann, Anna Milan – NOAA/NESDIS/NGDC

·         Tools are on top of web accessible folders

o   Also use portal (external view)

·         Here help people who are creating metadata to improve it to better understand connections

·         NOAA wiki – NOAA EDM (old GEO IDE) http://geo-ide.noaa.gov/wiki

·         Wiki

o   Discussion pages – include examples – have explanations – first things created on wiki

o   ISO explorer – for class/element – structure/order/ alternatives – help people editing metadata

o   Pages were created based on community input (based on questions to Ted)

o   Training – approach to learning ISO – building blocks (structured paths through wiki content) – wiki more like encyclopedia…  Ted uses them like books

·         Wiki Navigation

o   Categories – important – automatic to group pages (many-many & sub-categories)

§  Work like a home page

o   ISO Explorer – has classes of FGDC (things need to be in the right order – not the same as the UML)

·         Many of the pages are updated mainly by Anna and Ted, but other people too… it is an ongoing effort

·         Web Accessible Folder

o   Folders available from website

o   People mange metadata in databases

o   Web access folder are then like a cache – people can harvest

o   Titles (with stars related to score), Links, Sources, last update, views (get data, FAQ, HTML, fields, comments, KML)

·         HTML view – able to link to wiki from each of these views

·         Metadata evaluation – rubric

o   Mechanism for evaluation – here completeness of metadata

o   1) use attribute convention for data discovery (ACDD)

o   2) defined by Ted’s group

o   Rubric made of spirals made of fields… linked to wiki – dynamic user guide

§  Red = bad, green = good – other information provided via urls (best practice)…  … opportunities for improvement

o   Each record has score… this is an evaluation tool

·         Connections – community has lots of dialects (or metadata standards)

o   ESIP wiki – documentation connections

o   How to document difference connects (ex. People – provide different dialects xpaths) – if you know more > talk to Ted

·         Q (Hook) – is this a NOAA manage/operated or community

o   Ted controls who can contribute

·         Q – we want to control/understand what document is being referred to in metadata – references in documents may include URLs – do you see a way to control obsolete data in a rubric

o   maintenance of links in metadata record

o   tools sit on web folders that check links

o   also – prefer xlink and then links controlled elsewhere

o   use something similar to link checking websites – work with series

o   recommend not using link in granule

o   Use resolvers (doi:)

·         Q - can the rubric provide guidance

o   Guidance but not control

·         Q – DOI landing page

o   When someone resolves DOI it goes to that page – can be created in metadata

·         Q – can landing page provide permanent link

o   Can easily extract links in a file and put them elsewhere – if permanent it can be a permanent landing page

o   Need to be actively manage/testing theseSession: News from ISOLand

1)      Helen Conover

2)      Hook Hua

3)      Ted Habermann

IsoLineage Metadata at AMSR-E SIPS – Helen Conover – GHRC DAAC/AMSR-E SIPS, University of Alabama in Huntsville

·         Terms for talk

o   dataset (ISO) = date file (individual science data file)

o   Product = series (ISO) (collection of data files

·         AMSR-E (Advanced Microwave Scanning Radiometer – Earth Observing System

·         SIPS (Science Investigator Lead operating system)

·         GHRC – does provenance (how did you get this, where did it come from, how can it be used – used to be called processing history) and add metadata and QC to data

·         Products = brightness temperatures, ocean products, monthly and daily ocean grids, sea ice concentration, snow depth, sea ice drift (typical NASA microwave suite)

·         Capture the contextual knowledge

o   Some is already there

o   Recently – putting metadata into ISO lineage metadata model

§  Lineage so it can be added to full suite of ISO data

·         Legacy data system (HDF-EOS2)

·         Capture – which data products go into which for the different data projects (ex. Rain has rain and brightness temperatures)

o   SIPS provides control script – does not include science

·         ISO is complex (comprehensive) – need to make friends in community

o   Only look at lineage

·         Lineage Model

o   Lineage – descript source, and processing (which does down to algorithm)

o   LE = 19115-2 – they are an extension of the original model (LI) – to facilitate more detailed description of lineage

o   DQ_DataQuality à LI_Lineage (quality of this product – i.e. what went into making it)

·         XML and ISO are verbose way to ‘saying things’ – intending to attach it to the data file (increase size)

o   Ended up with HDF SE attribute (with HDF-EOS) – this is an HDF4

·         Lineage Granularity

o   Lineage info is the same across all product – capture info for a unique file (when and where it was processed)

o   At product or series level – capture attribute information

o   Keep all lineage data in each file (2 elements in each file)

o   XML “dataset” and “series”

·         Lineage Model

o   Where put information – first big job (then how to say)

o   ProcessStep – high level processing description

o   Algorithm – science algorithm name, version, author, description (high level info and pointers to real data)

o   Delivered algorithm package might change but not always change the science algorithm

o   DOI and specific descriptions in Source files

o   These are done once per version of data

o   Then automated process for each data file for processing date/time/ location/ input and output files

·         Question – Echo data (cloud cover) – easy map from ECHO to ISO

o   ECHO attribute or PSA – is that mapped in LE_Algorithm?

o   Value would be somewhere else

o   Ted – can have any number of processing steps (0..*) – can have separate ProcessStep or Algorithm

·         Q – what level of granularity

o   Tried to capture the science algorithms

o   Ex. Sea ice – one processing executable, gridding, and 2 algorithm, and then snow depth

·         Q – each file has many attributes

o   In provenance system – capture the attributes – map the variables to each algorithm

o   Not to the level of equation names (some have actual names and others are descriptions)

o   Did not do mapping of variable to algorithm in ISO

·         Versioning

o   ESDT – doesn’t change often

o   DAP (Delivered Algorithm Package)

o   Trying to tie the processing algorithm version to the metadata

§  Includes what algorithm does, description, and author info

·         DOIs

o   NASA trying to figure out how to handle DOI at ISO level

o   NASS ES difference between GCMD DIF and netCDF CF

o   Decided to combine url and DOI and then text associate is the “doi”

·         Q – can use anything not use DOI

o   Yes – hence put DOI (but description is not part of Identifier)

·         Use codeSpace to indicate NASA ESDIS as publisher

o   What this to be part of the NASA flavor of ISO

o   They are the authority for DOI

·         Challenges – complicated, it is evolving, schemas have not been promptly provided, need to reach community consensus

·         Need – NASA flavored schema, concrete examples, representations in other languages, communication (?online forum)

·         Q (Aleksandar) – where get processing lineage before put into ISO

o   Red on screen = online form, talk to producer, fill in form and store in database

o   Blue = processed in house (file read events, software evocation) – parsed into database and then XML

·         Q (Jennifer) – do you have a cheat sheet of summary

o   Lots info the NOAA GEO-IDE wiki

o   A lot of details – emailed works – then sorted through

o   Need online resource (don’t have fully validated XML) – will be there in a month

o   NASA is also developing their own wiki

 

A Practical Application Using ISO Metadata – Incorporating ISO Metadata into SMAP Data Products – Barry Weiss Hook Hua, Vance Haemmerle (JPL)

·         SMAP – first NASA decadal mission

·         Soil moisture

·         15 products L1 to L4 (parsed radar telemetry to carbon net ecosystem exchange)

o   Trying to create ISO for different data products (large undertaking)

·         Level 1 requirement for ISO metadata (required from the top)

o   Using ISO because it is international, common representative (contextual model and encoding of it)

o   Include tools, use cases

·         ISO basic concepts

o   Granule metadata = dataset

o   Collection metadata = series metadata

o   Codelist = enumerated list of accepted values

o   Profile = community agreement of particular elements

o   Extension = explicit modification (NASA is a flavor)

o   EX = Extent, LE = lineage, CI = citation

·         ISO geographic standards

o   Using ISO 19130 – imagery sensor model

o   Usually talking about 19139 encoding

·         UML

o   Progress Code – completed, historic obsoleted

o   SMAP needs extension points – need to mark products as “beta, stage 1, stage 4” – what to augment in code list

o   Do we need to standardize code list

§  2 camps (1) 19139 XML (2) HDF5 group

§  Kept both

·         From the Earth Science Data Model (ESDM) – all in HDF5 metadata group

o   Create crosswalk between HDF5 and ISO groups

o   Ex. Lineage would be subgroup in HDF5

o   Lineage group include attitude, ephemeris, antenna pointing, …

o   Renamed MI_Identifier as identified_product_doi for DOI

·         Started with UML diagrams from ISO and expanded where needed for SMAP

o   What were the extensions that were needed

o   Then generated spreadsheet – provided mapping between ISO to HDF to ESDM

o   ESDM defines gaps, ISO only beginning and ending

o   Ex. Extent – needed to add a vertical extent

·         Spreadsheet provided more than exercise – did the mapping programmatically

o   Mapping used by converter to generate extra files for crosswalk

·         For series metadata - Delivered by data architect

·         For dataset – problem = automation

o   Spreadsheets are the first step

o   Use info to automatically inject the correct fields into the ISO from HDF

o   Able to reduce dependencies to only HDF5 libraries – simplified things

·         Saxon used for crosswalk (transforms needed for each flavor)

o   Decoupling science software and metadata dialect

o   That means if the dialect changes

·         Q (Peter) - Has program to move from HDF to ISO

o   Not writing software – writing rules (saxon)

o   XSL – it is an open source tool (apache product)

o   Kept as simple as possible

o   Ted – also have transform from OpenDAP land to ISO and NC-ISO – translates from one XML to another

·         Q - (Peter) – rules defines the fields?  - yes

·         Q – what about binary data

o   HDF group – h5dump – ignores all data arrays (only dumps out metadata)

·         DOI and UID

o   MD_Identifier has been updated to be a formal class (changed) – identify if DOI

·         SMAP extension

o   Additional attributes (ex. Run time parameters)

§  Eos and echo additional attributes

§  Issue – couples the type and the values – need to repeat type definition throughout (sometimes doesn’t want to repeat)

o   Only one citation for algorithm

·         Validation

o   XML data binding tools

o   Ted suggested schematron approach – use rules (popular in ISO community)

§  Ex. Width needs to be followed by high

·         Limitations in HD5 (1.8 library)

o   In hierarchy – group names have to be unique (can’t represent arrays of groups)

§  But arrays are common in ISO

o   H5dump – UDT (user defined data types)

§  Not fully supported – become text blogs

·         NASA flavor recommendations

o   Acquisition information – some belong to granule and some series

o   Namespaces

·         ISO is cutting edge… to NASA

·         Lessons

o   Easing into ISO

o   ISO deeply nested

o   Simplicity – ex. Only HDF5 (easier to use Matlab)

o   Need flavor

·         Q – how benefiting from international flavor

o   19000 series – cover geographic but not mission specific

o   Flavor is a community agreement (not changing standard) – use same extension – these options

·         Q (Erin) – how different than a profile

o   Call it what you want (NASA likes flavor)

o   Flavor – is a code list (Erin)

·         Instrument, platform, processing – ISO revised every 5 years – implement now as extension and then add to discussion for future (community process of extensions)

·         Q – will the standards have evolved in time for SMAP mission (currently using previous version)

o   Ted – ISO has standard mechanism to extend itself

·         Q (Alek) – how much larger

o   It is in the noise (10-70 k) –

o   Helen had 100 k files that only had 10k spot

 

Wikis, Rubrics, Views and Connections: An Integrated Approach to Improving Documentation – Ted Habermann, Anna Milan – NOAA/NESDIS/NGDC

·         Tools are on top of web accessible folders

o   Also use portal (external view)

·         Here help people who are creating metadata to improve it to better understand connections

·         NOAA wiki – NOAA EDM (old GEO IDE) http://geo-ide.noaa.gov/wiki

·         Wiki

o   Discussion pages – include examples – have explanations – first things created on wiki

o   ISO explorer – for class/element – structure/order/ alternatives – help people editing metadata

o   Pages were created based on community input (based on questions to Ted)

o   Training – approach to learning ISO – building blocks (structured paths through wiki content) – wiki more like encyclopedia…  Ted uses them like books

·         Wiki Navigation

o   Categories – important – automatic to group pages (many-many & sub-categories)

§  Work like a home page

o   ISO Explorer – has classes of FGDC (things need to be in the right order – not the same as the UML)

·         Many of the pages are updated mainly by Anna and Ted, but other people too… it is an ongoing effort

·         Web Accessible Folder

o   Folders available from website

o   People mange metadata in databases

o   Web access folder are then like a cache – people can harvest

o   Titles (with stars related to score), Links, Sources, last update, views (get data, FAQ, HTML, fields, comments, KML)

·         HTML view – able to link to wiki from each of these views

·         Metadata evaluation – rubric

o   Mechanism for evaluation – here completeness of metadata

o   1) use attribute convention for data discovery (ACDD)

o   2) defined by Ted’s group

o   Rubric made of spirals made of fields… linked to wiki – dynamic user guide

§  Red = bad, green = good – other information provided via urls (best practice)…  … opportunities for improvement

o   Each record has score… this is an evaluation tool

·         Connections – community has lots of dialects (or metadata standards)

o   ESIP wiki – documentation connections

o   How to document difference connects (ex. People – provide different dialects xpaths) – if you know more > talk to Ted

·         Q (Hook) – is this a NOAA manage/operated or community

o   Ted controls who can contribute

·         Q – we want to control/understand what document is being referred to in metadata – references in documents may include URLs – do you see a way to control obsolete data in a rubric

o   maintenance of links in metadata record

o   tools sit on web folders that check links

o   also – prefer xlink and then links controlled elsewhere

o   use something similar to link checking websites – work with series

o   recommend not using link in granule

o   Use resolvers (doi:)

·         Q - can the rubric provide guidance

o   Guidance but not control

·         Q – DOI landing page

o   When someone resolves DOI it goes to that page – can be created in metadata

·         Q – can landing page provide permanent link

o   Can easily extract links in a file and put them elsewhere – if permanent it can be a permanent landing page

o   Need to be actively manage/testing these

Citation:
Habermann, T.; Recent Developments in ISOLand; Winter Meeting 2013. ESIP Commons , October 2012