Recent Developments in ISOLand
There have been a number of recent applications, developments and changes in ISO Standards that are relevant to ESIP. These include implementations of granule metadata production tools by SMAP, ISO lineage implementations for AMSR-E and several changes to standards: the revision of 19115 and support for xml implementations of that revison, the new data quality implementation (19157), and the revision of 19115-2 (acquisition and instruments) which is coming up in the near future.
Session: News from ISOLand
IsoLineage Metadata at AMSR-E SIPS – Helen Conover – GHRC DAAC/AMSR-E SIPS, University of Alabama in Huntsville
· Terms for talk
o dataset (ISO) = date file (individual science data file)
o Product = series (ISO) (collection of data files
· AMSR-E (Advanced Microwave Scanning Radiometer – Earth Observing System
· SIPS (Science Investigator Lead operating system)
· GHRC – does provenance (how did you get this, where did it come from, how can it be used – used to be called processing history) and add metadata and QC to data
· Products = brightness temperatures, ocean products, monthly and daily ocean grids, sea ice concentration, snow depth, sea ice drift (typical NASA microwave suite)
· Capture the contextual knowledge
o Some is already there
o Recently – putting metadata into ISO lineage metadata model
§ Lineage so it can be added to full suite of ISO data
· Legacy data system (HDF-EOS2)
· Capture – which data products go into which for the different data projects (ex. Rain has rain and brightness temperatures)
o SIPS provides control script – does not include science
· ISO is complex (comprehensive) – need to make friends in community
o Only look at lineage
· Lineage Model
o Lineage – descript source, and processing (which does down to algorithm)
o LE = 19115-2 – they are an extension of the original model (LI) – to facilitate more detailed description of lineage
o DQ_DataQuality à LI_Lineage (quality of this product – i.e. what went into making it)
· XML and ISO are verbose way to ‘saying things’ – intending to attach it to the data file (increase size)
o Ended up with HDF SE attribute (with HDF-EOS) – this is an HDF4
· Lineage Granularity
o Lineage info is the same across all product – capture info for a unique file (when and where it was processed)
o At product or series level – capture attribute information
o Keep all lineage data in each file (2 elements in each file)
o XML “dataset” and “series”
· Lineage Model
o Where put information – first big job (then how to say)
o ProcessStep – high level processing description
o Algorithm – science algorithm name, version, author, description (high level info and pointers to real data)
o Delivered algorithm package might change but not always change the science algorithm
o DOI and specific descriptions in Source files
o These are done once per version of data
o Then automated process for each data file for processing date/time/ location/ input and output files
· Question – Echo data (cloud cover) – easy map from ECHO to ISO
o ECHO attribute or PSA – is that mapped in LE_Algorithm?
o Value would be somewhere else
o Ted – can have any number of processing steps (0..*) – can have separate ProcessStep or Algorithm
· Q – what level of granularity
o Tried to capture the science algorithms
o Ex. Sea ice – one processing executable, gridding, and 2 algorithm, and then snow depth
· Q – each file has many attributes
o In provenance system – capture the attributes – map the variables to each algorithm
o Not to the level of equation names (some have actual names and others are descriptions)
o Did not do mapping of variable to algorithm in ISO
· Versioning
o ESDT – doesn’t change often
o DAP (Delivered Algorithm Package)
o Trying to tie the processing algorithm version to the metadata
§ Includes what algorithm does, description, and author info
· DOIs
o NASA trying to figure out how to handle DOI at ISO level
o NASS ES difference between GCMD DIF and netCDF CF
o Decided to combine url and DOI and then text associate is the “doi”
· Q – can use anything not use DOI
o Yes – hence put DOI (but description is not part of Identifier)
· Use codeSpace to indicate NASA ESDIS as publisher
o What this to be part of the NASA flavor of ISO
o They are the authority for DOI
· Challenges – complicated, it is evolving, schemas have not been promptly provided, need to reach community consensus
· Need – NASA flavored schema, concrete examples, representations in other languages, communication (?online forum)
· Q (Aleksandar) – where get processing lineage before put into ISO
o Red on screen = online form, talk to producer, fill in form and store in database
o Blue = processed in house (file read events, software evocation) – parsed into database and then XML
· Q (Jennifer) – do you have a cheat sheet of summary
o Lots info the NOAA GEO-IDE wiki
o A lot of details – emailed works – then sorted through
o Need online resource (don’t have fully validated XML) – will be there in a month
o NASA is also developing their own wiki
A Practical Application Using ISO Metadata – Incorporating ISO Metadata into SMAP Data Products – Barry Weiss Hook Hua, Vance Haemmerle (JPL)
· SMAP – first NASA decadal mission
· Soil moisture
· 15 products L1 to L4 (parsed radar telemetry to carbon net ecosystem exchange)
o Trying to create ISO for different data products (large undertaking)
· Level 1 requirement for ISO metadata (required from the top)
o Using ISO because it is international, common representative (contextual model and encoding of it)
o Include tools, use cases
· ISO basic concepts
o Granule metadata = dataset
o Collection metadata = series metadata
o Codelist = enumerated list of accepted values
o Profile = community agreement of particular elements
o Extension = explicit modification (NASA is a flavor)
o EX = Extent, LE = lineage, CI = citation
· ISO geographic standards
o Using ISO 19130 – imagery sensor model
o Usually talking about 19139 encoding
· UML
o Progress Code – completed, historic obsoleted
o SMAP needs extension points – need to mark products as “beta, stage 1, stage 4” – what to augment in code list
o Do we need to standardize code list
§ 2 camps (1) 19139 XML (2) HDF5 group
§ Kept both
· From the Earth Science Data Model (ESDM) – all in HDF5 metadata group
o Create crosswalk between HDF5 and ISO groups
o Ex. Lineage would be subgroup in HDF5
o Lineage group include attitude, ephemeris, antenna pointing, …
o Renamed MI_Identifier as identified_product_doi for DOI
· Started with UML diagrams from ISO and expanded where needed for SMAP
o What were the extensions that were needed
o Then generated spreadsheet – provided mapping between ISO to HDF to ESDM
o ESDM defines gaps, ISO only beginning and ending
o Ex. Extent – needed to add a vertical extent
· Spreadsheet provided more than exercise – did the mapping programmatically
o Mapping used by converter to generate extra files for crosswalk
· For series metadata - Delivered by data architect
· For dataset – problem = automation
o Spreadsheets are the first step
o Use info to automatically inject the correct fields into the ISO from HDF
o Able to reduce dependencies to only HDF5 libraries – simplified things
· Saxon used for crosswalk (transforms needed for each flavor)
o Decoupling science software and metadata dialect
o That means if the dialect changes
· Q (Peter) - Has program to move from HDF to ISO
o Not writing software – writing rules (saxon)
o XSL – it is an open source tool (apache product)
o Kept as simple as possible
o Ted – also have transform from OpenDAP land to ISO and NC-ISO – translates from one XML to another
· Q - (Peter) – rules defines the fields? - yes
· Q – what about binary data
o HDF group – h5dump – ignores all data arrays (only dumps out metadata)
· DOI and UID
o MD_Identifier has been updated to be a formal class (changed) – identify if DOI
· SMAP extension
o Additional attributes (ex. Run time parameters)
§ Eos and echo additional attributes
§ Issue – couples the type and the values – need to repeat type definition throughout (sometimes doesn’t want to repeat)
o Only one citation for algorithm
· Validation
o XML data binding tools
o Ted suggested schematron approach – use rules (popular in ISO community)
§ Ex. Width needs to be followed by high
· Limitations in HD5 (1.8 library)
o In hierarchy – group names have to be unique (can’t represent arrays of groups)
§ But arrays are common in ISO
o H5dump – UDT (user defined data types)
§ Not fully supported – become text blogs
· NASA flavor recommendations
o Acquisition information – some belong to granule and some series
o Namespaces
· ISO is cutting edge… to NASA
· Lessons
o Easing into ISO
o ISO deeply nested
o Simplicity – ex. Only HDF5 (easier to use Matlab)
o Need flavor
· Q – how benefiting from international flavor
o 19000 series – cover geographic but not mission specific
o Flavor is a community agreement (not changing standard) – use same extension – these options
· Q (Erin) – how different than a profile
o Call it what you want (NASA likes flavor)
o Flavor – is a code list (Erin)
· Instrument, platform, processing – ISO revised every 5 years – implement now as extension and then add to discussion for future (community process of extensions)
· Q – will the standards have evolved in time for SMAP mission (currently using previous version)
o Ted – ISO has standard mechanism to extend itself
· Q (Alek) – how much larger
o It is in the noise (10-70 k) –
o Helen had 100 k files that only had 10k spot
Wikis, Rubrics, Views and Connections: An Integrated Approach to Improving Documentation – Ted Habermann, Anna Milan – NOAA/NESDIS/NGDC
· Tools are on top of web accessible folders
o Also use portal (external view)
· Here help people who are creating metadata to improve it to better understand connections
· NOAA wiki – NOAA EDM (old GEO IDE) http://geo-ide.noaa.gov/wiki
· Wiki
o Discussion pages – include examples – have explanations – first things created on wiki
o ISO explorer – for class/element – structure/order/ alternatives – help people editing metadata
o Pages were created based on community input (based on questions to Ted)
o Training – approach to learning ISO – building blocks (structured paths through wiki content) – wiki more like encyclopedia… Ted uses them like books
· Wiki Navigation
o Categories – important – automatic to group pages (many-many & sub-categories)
§ Work like a home page
o ISO Explorer – has classes of FGDC (things need to be in the right order – not the same as the UML)
· Many of the pages are updated mainly by Anna and Ted, but other people too… it is an ongoing effort
· Web Accessible Folder
o Folders available from website
o People mange metadata in databases
o Web access folder are then like a cache – people can harvest
o Titles (with stars related to score), Links, Sources, last update, views (get data, FAQ, HTML, fields, comments, KML)
· HTML view – able to link to wiki from each of these views
· Metadata evaluation – rubric
o Mechanism for evaluation – here completeness of metadata
o 1) use attribute convention for data discovery (ACDD)
o 2) defined by Ted’s group
o Rubric made of spirals made of fields… linked to wiki – dynamic user guide
§ Red = bad, green = good – other information provided via urls (best practice)… … opportunities for improvement
o Each record has score… this is an evaluation tool
· Connections – community has lots of dialects (or metadata standards)
o ESIP wiki – documentation connections
o How to document difference connects (ex. People – provide different dialects xpaths) – if you know more > talk to Ted
· Q (Hook) – is this a NOAA manage/operated or community
o Ted controls who can contribute
· Q – we want to control/understand what document is being referred to in metadata – references in documents may include URLs – do you see a way to control obsolete data in a rubric
o maintenance of links in metadata record
o tools sit on web folders that check links
o also – prefer xlink and then links controlled elsewhere
o use something similar to link checking websites – work with series
o recommend not using link in granule
o Use resolvers (doi:)
· Q - can the rubric provide guidance
o Guidance but not control
· Q – DOI landing page
o When someone resolves DOI it goes to that page – can be created in metadata
· Q – can landing page provide permanent link
o Can easily extract links in a file and put them elsewhere – if permanent it can be a permanent landing page
o Need to be actively manage/testing theseSession: News from ISOLand
1) Helen Conover
2) Hook Hua
3) Ted Habermann
IsoLineage Metadata at AMSR-E SIPS – Helen Conover – GHRC DAAC/AMSR-E SIPS, University of Alabama in Huntsville
· Terms for talk
o dataset (ISO) = date file (individual science data file)
o Product = series (ISO) (collection of data files
· AMSR-E (Advanced Microwave Scanning Radiometer – Earth Observing System
· SIPS (Science Investigator Lead operating system)
· GHRC – does provenance (how did you get this, where did it come from, how can it be used – used to be called processing history) and add metadata and QC to data
· Products = brightness temperatures, ocean products, monthly and daily ocean grids, sea ice concentration, snow depth, sea ice drift (typical NASA microwave suite)
· Capture the contextual knowledge
o Some is already there
o Recently – putting metadata into ISO lineage metadata model
§ Lineage so it can be added to full suite of ISO data
· Legacy data system (HDF-EOS2)
· Capture – which data products go into which for the different data projects (ex. Rain has rain and brightness temperatures)
o SIPS provides control script – does not include science
· ISO is complex (comprehensive) – need to make friends in community
o Only look at lineage
· Lineage Model
o Lineage – descript source, and processing (which does down to algorithm)
o LE = 19115-2 – they are an extension of the original model (LI) – to facilitate more detailed description of lineage
o DQ_DataQuality à LI_Lineage (quality of this product – i.e. what went into making it)
· XML and ISO are verbose way to ‘saying things’ – intending to attach it to the data file (increase size)
o Ended up with HDF SE attribute (with HDF-EOS) – this is an HDF4
· Lineage Granularity
o Lineage info is the same across all product – capture info for a unique file (when and where it was processed)
o At product or series level – capture attribute information
o Keep all lineage data in each file (2 elements in each file)
o XML “dataset” and “series”
· Lineage Model
o Where put information – first big job (then how to say)
o ProcessStep – high level processing description
o Algorithm – science algorithm name, version, author, description (high level info and pointers to real data)
o Delivered algorithm package might change but not always change the science algorithm
o DOI and specific descriptions in Source files
o These are done once per version of data
o Then automated process for each data file for processing date/time/ location/ input and output files
· Question – Echo data (cloud cover) – easy map from ECHO to ISO
o ECHO attribute or PSA – is that mapped in LE_Algorithm?
o Value would be somewhere else
o Ted – can have any number of processing steps (0..*) – can have separate ProcessStep or Algorithm
· Q – what level of granularity
o Tried to capture the science algorithms
o Ex. Sea ice – one processing executable, gridding, and 2 algorithm, and then snow depth
· Q – each file has many attributes
o In provenance system – capture the attributes – map the variables to each algorithm
o Not to the level of equation names (some have actual names and others are descriptions)
o Did not do mapping of variable to algorithm in ISO
· Versioning
o ESDT – doesn’t change often
o DAP (Delivered Algorithm Package)
o Trying to tie the processing algorithm version to the metadata
§ Includes what algorithm does, description, and author info
· DOIs
o NASA trying to figure out how to handle DOI at ISO level
o NASS ES difference between GCMD DIF and netCDF CF
o Decided to combine url and DOI and then text associate is the “doi”
· Q – can use anything not use DOI
o Yes – hence put DOI (but description is not part of Identifier)
· Use codeSpace to indicate NASA ESDIS as publisher
o What this to be part of the NASA flavor of ISO
o They are the authority for DOI
· Challenges – complicated, it is evolving, schemas have not been promptly provided, need to reach community consensus
· Need – NASA flavored schema, concrete examples, representations in other languages, communication (?online forum)
· Q (Aleksandar) – where get processing lineage before put into ISO
o Red on screen = online form, talk to producer, fill in form and store in database
o Blue = processed in house (file read events, software evocation) – parsed into database and then XML
· Q (Jennifer) – do you have a cheat sheet of summary
o Lots info the NOAA GEO-IDE wiki
o A lot of details – emailed works – then sorted through
o Need online resource (don’t have fully validated XML) – will be there in a month
o NASA is also developing their own wiki
A Practical Application Using ISO Metadata – Incorporating ISO Metadata into SMAP Data Products – Barry Weiss Hook Hua, Vance Haemmerle (JPL)
· SMAP – first NASA decadal mission
· Soil moisture
· 15 products L1 to L4 (parsed radar telemetry to carbon net ecosystem exchange)
o Trying to create ISO for different data products (large undertaking)
· Level 1 requirement for ISO metadata (required from the top)
o Using ISO because it is international, common representative (contextual model and encoding of it)
o Include tools, use cases
· ISO basic concepts
o Granule metadata = dataset
o Collection metadata = series metadata
o Codelist = enumerated list of accepted values
o Profile = community agreement of particular elements
o Extension = explicit modification (NASA is a flavor)
o EX = Extent, LE = lineage, CI = citation
· ISO geographic standards
o Using ISO 19130 – imagery sensor model
o Usually talking about 19139 encoding
· UML
o Progress Code – completed, historic obsoleted
o SMAP needs extension points – need to mark products as “beta, stage 1, stage 4” – what to augment in code list
o Do we need to standardize code list
§ 2 camps (1) 19139 XML (2) HDF5 group
§ Kept both
· From the Earth Science Data Model (ESDM) – all in HDF5 metadata group
o Create crosswalk between HDF5 and ISO groups
o Ex. Lineage would be subgroup in HDF5
o Lineage group include attitude, ephemeris, antenna pointing, …
o Renamed MI_Identifier as identified_product_doi for DOI
· Started with UML diagrams from ISO and expanded where needed for SMAP
o What were the extensions that were needed
o Then generated spreadsheet – provided mapping between ISO to HDF to ESDM
o ESDM defines gaps, ISO only beginning and ending
o Ex. Extent – needed to add a vertical extent
· Spreadsheet provided more than exercise – did the mapping programmatically
o Mapping used by converter to generate extra files for crosswalk
· For series metadata - Delivered by data architect
· For dataset – problem = automation
o Spreadsheets are the first step
o Use info to automatically inject the correct fields into the ISO from HDF
o Able to reduce dependencies to only HDF5 libraries – simplified things
· Saxon used for crosswalk (transforms needed for each flavor)
o Decoupling science software and metadata dialect
o That means if the dialect changes
· Q (Peter) - Has program to move from HDF to ISO
o Not writing software – writing rules (saxon)
o XSL – it is an open source tool (apache product)
o Kept as simple as possible
o Ted – also have transform from OpenDAP land to ISO and NC-ISO – translates from one XML to another
· Q - (Peter) – rules defines the fields? - yes
· Q – what about binary data
o HDF group – h5dump – ignores all data arrays (only dumps out metadata)
· DOI and UID
o MD_Identifier has been updated to be a formal class (changed) – identify if DOI
· SMAP extension
o Additional attributes (ex. Run time parameters)
§ Eos and echo additional attributes
§ Issue – couples the type and the values – need to repeat type definition throughout (sometimes doesn’t want to repeat)
o Only one citation for algorithm
· Validation
o XML data binding tools
o Ted suggested schematron approach – use rules (popular in ISO community)
§ Ex. Width needs to be followed by high
· Limitations in HD5 (1.8 library)
o In hierarchy – group names have to be unique (can’t represent arrays of groups)
§ But arrays are common in ISO
o H5dump – UDT (user defined data types)
§ Not fully supported – become text blogs
· NASA flavor recommendations
o Acquisition information – some belong to granule and some series
o Namespaces
· ISO is cutting edge… to NASA
· Lessons
o Easing into ISO
o ISO deeply nested
o Simplicity – ex. Only HDF5 (easier to use Matlab)
o Need flavor
· Q – how benefiting from international flavor
o 19000 series – cover geographic but not mission specific
o Flavor is a community agreement (not changing standard) – use same extension – these options
· Q (Erin) – how different than a profile
o Call it what you want (NASA likes flavor)
o Flavor – is a code list (Erin)
· Instrument, platform, processing – ISO revised every 5 years – implement now as extension and then add to discussion for future (community process of extensions)
· Q – will the standards have evolved in time for SMAP mission (currently using previous version)
o Ted – ISO has standard mechanism to extend itself
· Q (Alek) – how much larger
o It is in the noise (10-70 k) –
o Helen had 100 k files that only had 10k spot
Wikis, Rubrics, Views and Connections: An Integrated Approach to Improving Documentation – Ted Habermann, Anna Milan – NOAA/NESDIS/NGDC
· Tools are on top of web accessible folders
o Also use portal (external view)
· Here help people who are creating metadata to improve it to better understand connections
· NOAA wiki – NOAA EDM (old GEO IDE) http://geo-ide.noaa.gov/wiki
· Wiki
o Discussion pages – include examples – have explanations – first things created on wiki
o ISO explorer – for class/element – structure/order/ alternatives – help people editing metadata
o Pages were created based on community input (based on questions to Ted)
o Training – approach to learning ISO – building blocks (structured paths through wiki content) – wiki more like encyclopedia… Ted uses them like books
· Wiki Navigation
o Categories – important – automatic to group pages (many-many & sub-categories)
§ Work like a home page
o ISO Explorer – has classes of FGDC (things need to be in the right order – not the same as the UML)
· Many of the pages are updated mainly by Anna and Ted, but other people too… it is an ongoing effort
· Web Accessible Folder
o Folders available from website
o People mange metadata in databases
o Web access folder are then like a cache – people can harvest
o Titles (with stars related to score), Links, Sources, last update, views (get data, FAQ, HTML, fields, comments, KML)
· HTML view – able to link to wiki from each of these views
· Metadata evaluation – rubric
o Mechanism for evaluation – here completeness of metadata
o 1) use attribute convention for data discovery (ACDD)
o 2) defined by Ted’s group
o Rubric made of spirals made of fields… linked to wiki – dynamic user guide
§ Red = bad, green = good – other information provided via urls (best practice)… … opportunities for improvement
o Each record has score… this is an evaluation tool
· Connections – community has lots of dialects (or metadata standards)
o ESIP wiki – documentation connections
o How to document difference connects (ex. People – provide different dialects xpaths) – if you know more > talk to Ted
· Q (Hook) – is this a NOAA manage/operated or community
o Ted controls who can contribute
· Q – we want to control/understand what document is being referred to in metadata – references in documents may include URLs – do you see a way to control obsolete data in a rubric
o maintenance of links in metadata record
o tools sit on web folders that check links
o also – prefer xlink and then links controlled elsewhere
o use something similar to link checking websites – work with series
o recommend not using link in granule
o Use resolvers (doi:)
· Q - can the rubric provide guidance
o Guidance but not control
· Q – DOI landing page
o When someone resolves DOI it goes to that page – can be created in metadata
· Q – can landing page provide permanent link
o Can easily extract links in a file and put them elsewhere – if permanent it can be a permanent landing page
o Need to be actively manage/testing these