Recent Developments in ISOLand [1]
Submitted by Krbm on Tue, 2012-10-30 13:16There have been a number of recent applications, developments and changes in ISO Standards that are relevant to ESIP. These include implementations of granule metadata production tools by SMAP, ISO lineage implementations for AMSR-E and several changes to standards: the revision of 19115 and support for xml implementations of that revison, the new data quality implementation (19157), and the revision of 19115-2 (acquisition and instruments) which is coming up in the near future.
Session: News from ISOLand
IsoLineage Metadata at AMSR-E SIPS – Helen Conover – GHRC DAAC/AMSR-E SIPS, University of Alabama in Huntsville
· Terms for talk
o dataset (ISO) = date file (individual science data file)
o Product = series (ISO) (collection of data files
· AMSR-E (Advanced Microwave Scanning Radiometer – Earth Observing System
· SIPS (Science Investigator Lead operating system)
· GHRC – does provenance (how did you get this, where did it come from, how can it be used – used to be called processing history) and add metadata and QC to data
· Products = brightness temperatures, ocean products, monthly and daily ocean grids, sea ice concentration, snow depth, sea ice drift (typical NASA microwave suite)
· Capture the contextual knowledge
o Some is already there
o Recently – putting metadata into ISO lineage metadata model
§ Lineage so it can be added to full suite of ISO data
· Legacy data system (HDF-EOS2)
· Capture – which data products go into which for the different data projects (ex. Rain has rain and brightness temperatures)
o SIPS provides control script – does not include science
· ISO is complex (comprehensive) – need to make friends in community
o Only look at lineage
· Lineage Model
o Lineage – descript source, and processing (which does down to algorithm)
o LE = 19115-2 – they are an extension of the original model (LI) – to facilitate more detailed description of lineage
o DQ_DataQuality à LI_Lineage (quality of this product – i.e. what went into making it)
· XML and ISO are verbose way to ‘saying things’ – intending to attach it to the data file (increase size)
o Ended up with HDF SE attribute (with HDF-EOS) – this is an HDF4
· Lineage Granularity
o Lineage info is the same across all product – capture info for a unique file (when and where it was processed)
o At product or series level – capture attribute information
o Keep all lineage data in each file (2 elements in each file)
o XML “dataset” and “series”
· Lineage Model
o Where put information – first big job (then how to say)
o ProcessStep – high level processing description
o Algorithm – science algorithm name, version, author, description (high level info and pointers to real data)
o Delivered algorithm package might change but not always change the science algorithm
o DOI and specific descriptions in Source files
o These are done once per version of data
o Then automated process for each data file for processing date/time/ location/ input and output files
· Question – Echo data (cloud cover) – easy map from ECHO to ISO
o ECHO attribute or PSA – is that mapped in LE_Algorithm?
o Value would be somewhere else
o Ted – can have any number of processing steps (0..*) – can have separate ProcessStep or Algorithm
· Q – what level of granularity
o Tried to capture the science algorithms
o Ex. Sea ice – one processing executable, gridding, and 2 algorithm, and then snow depth
· Q – each file has many attributes
o In provenance system – capture the attributes – map the variables to each algorithm
o Not to the level of equation names (some have actual names and others are descriptions)
o Did not do mapping of variable to algorithm in ISO
· Versioning
o ESDT – doesn’t change often
o DAP (Delivered Algorithm Package)
o Trying to tie the processing algorithm version to the metadata
§ Includes what algorithm does, description, and author info
· DOIs
o NASA trying to figure out how to handle DOI at ISO level
o NASS ES difference between GCMD DIF and netCDF CF
o Decided to combine url and DOI and then text associate is the “doi”
· Q – can use anything not use DOI
o Yes – hence put DOI (but description is not part of Identifier)
· Use codeSpace to indicate NASA ESDIS as publisher
o What this to be part of the NASA flavor of ISO
o They are the authority for DOI
· Challenges – complicated, it is evolving, schemas have not been promptly provided, need to reach community consensus
· Need – NASA flavored schema, concrete examples, representations in other languages, communication (?online forum)
· Q (Aleksandar) – where get processing lineage before put into ISO
o Red on screen = online form, talk to producer, fill in form and store in database
o Blue = processed in house (file read events, software evocation) – parsed into database and then XML
· Q (Jennifer) – do you have a cheat sheet of summary
o Lots info the NOAA GEO-IDE wiki
o A lot of details – emailed works – then sorted through
o Need online resource (don’t have fully validated XML) – will be there in a month
o NASA is also developing their own wiki
A Practical Application Using ISO Metadata – Incorporating ISO Metadata into SMAP Data Products – Barry Weiss Hook Hua, Vance Haemmerle (JPL)
· SMAP – first NASA decadal mission
· Soil moisture
· 15 products L1 to L4 (parsed radar telemetry to carbon net ecosystem exchange)
o Trying to create ISO for different data products (large undertaking)
· Level 1 requirement for ISO metadata (required from the top)
o Using ISO because it is international, common representative (contextual model and encoding of it)
o Include tools, use cases
· ISO basic concepts
o Granule metadata = dataset
o Collection metadata = series metadata
o Codelist = enumerated list of accepted values
o Profile = community agreement of particular elements
o Extension = explicit modification (NASA is a flavor)
o EX = Extent, LE = lineage, CI = citation
· ISO geographic standards
o Using ISO 19130 – imagery sensor model
o Usually talking about 19139 encoding
· UML
o Progress Code – completed, historic obsoleted
o SMAP needs extension points – need to mark products as “beta, stage 1, stage 4” – what to augment in code list
o Do we need to standardize code list
§ 2 camps (1) 19139 XML (2) HDF5 group
§ Kept both
· From the Earth Science Data Model (ESDM) – all in HDF5 metadata group
o Create crosswalk between HDF5 and ISO groups
o Ex. Lineage would be subgroup in HDF5
o Lineage group include attitude, ephemeris, antenna pointing, …
o Renamed MI_Identifier as identified_product_doi for DOI
· Started with UML diagrams from ISO and expanded where needed for SMAP
o What were the extensions that were needed
o Then generated spreadsheet – provided mapping between ISO to HDF to ESDM
o ESDM defines gaps, ISO only beginning and ending
o Ex. Extent – needed to add a vertical extent
· Spreadsheet provided more than exercise – did the mapping programmatically
o Mapping used by converter to generate extra files for crosswalk
· For series metadata - Delivered by data architect
· For dataset – problem = automation
o Spreadsheets are the first step
o Use info to automatically inject the correct fields into the ISO from HDF
o Able to reduce dependencies to only HDF5 libraries – simplified things
· Saxon used for crosswalk (transforms needed for each flavor)
o Decoupling science software and metadata dialect
o That means if the dialect changes
· Q (Peter) - Has program to move from HDF to ISO
o Not writing software – writing rules (saxon)
o XSL – it is an open source tool (apache product)
o Kept as simple as possible
o Ted – also have transform from OpenDAP land to ISO and NC-ISO – translates from one XML to another
· Q - (Peter) – rules defines the fields? - yes
· Q – what about binary data
o HDF group – h5dump – ignores all data arrays (only dumps out metadata)
· DOI and UID
o MD_Identifier has been updated to be a formal class (changed) – identify if DOI
· SMAP extension
o Additional attributes (ex. Run time parameters)
§ Eos and echo additional attributes
§ Issue – couples the type and the values – need to repeat type definition throughout (sometimes doesn’t want to repeat)
o Only one citation for algorithm
· Validation
o XML data binding tools
o Ted suggested schematron approach – use rules (popular in ISO community)
§ Ex. Width needs to be followed by high
· Limitations in HD5 (1.8 library)
o In hierarchy – group names have to be unique (can’t represent arrays of groups)
§ But arrays are common in ISO
o H5dump – UDT (user defined data types)
§ Not fully supported – become text blogs
· NASA flavor recommendations
o Acquisition information – some belong to granule and some series
o Namespaces
· ISO is cutting edge… to NASA
· Lessons
o Easing into ISO
o ISO deeply nested
o Simplicity – ex. Only HDF5 (easier to use Matlab)
o Need flavor
· Q – how benefiting from international flavor
o 19000 series – cover geographic but not mission specific
o Flavor is a community agreement (not changing standard) – use same extension – these options
· Q (Erin) – how different than a profile
o Call it what you want (NASA likes flavor)
o Flavor – is a code list (Erin)
· Instrument, platform, processing – ISO revised every 5 years – implement now as extension and then add to discussion for future (community process of extensions)
· Q – will the standards have evolved in time for SMAP mission (currently using previous version)
o Ted – ISO has standard mechanism to extend itself
· Q (Alek) – how much larger
o It is in the noise (10-70 k) –
o Helen had 100 k files that only had 10k spot
Wikis, Rubrics, Views and Connections: An Integrated Approach to Improving Documentation – Ted Habermann, Anna Milan – NOAA/NESDIS/NGDC
· Tools are on top of web accessible folders
o Also use portal (external view)
· Here help people who are creating metadata to improve it to better understand connections
· NOAA wiki – NOAA EDM (old GEO IDE) http://geo-ide.noaa.gov/wiki [6]
· Wiki
o Discussion pages – include examples – have explanations – first things created on wiki
o ISO explorer – for class/element – structure/order/ alternatives – help people editing metadata
o Pages were created based on community input (based on questions to Ted)
o Training – approach to learning ISO – building blocks (structured paths through wiki content) – wiki more like encyclopedia… Ted uses them like books
· Wiki Navigation
o Categories – important – automatic to group pages (many-many & sub-categories)
§ Work like a home page
o ISO Explorer – has classes of FGDC (things need to be in the right order – not the same as the UML)
· Many of the pages are updated mainly by Anna and Ted, but other people too… it is an ongoing effort
· Web Accessible Folder
o Folders available from website
o People mange metadata in databases
o Web access folder are then like a cache – people can harvest
o Titles (with stars related to score), Links, Sources, last update, views (get data, FAQ, HTML, fields, comments, KML)
· HTML view – able to link to wiki from each of these views
· Metadata evaluation – rubric
o Mechanism for evaluation – here completeness of metadata
o 1) use attribute convention for data discovery (ACDD)
o 2) defined by Ted’s group
o Rubric made of spirals made of fields… linked to wiki – dynamic user guide
§ Red = bad, green = good – other information provided via urls (best practice)… … opportunities for improvement
o Each record has score… this is an evaluation tool
· Connections – community has lots of dialects (or metadata standards)
o ESIP wiki – documentation connections
o How to document difference connects (ex. People – provide different dialects xpaths) – if you know more > talk to Ted
· Q (Hook) – is this a NOAA manage/operated or community
o Ted controls who can contribute
· Q – we want to control/understand what document is being referred to in metadata – references in documents may include URLs – do you see a way to control obsolete data in a rubric
o maintenance of links in metadata record
o tools sit on web folders that check links
o also – prefer xlink and then links controlled elsewhere
o use something similar to link checking websites – work with series
o recommend not using link in granule
o Use resolvers (doi:)
· Q - can the rubric provide guidance
o Guidance but not control
· Q – DOI landing page
o When someone resolves DOI it goes to that page – can be created in metadata
· Q – can landing page provide permanent link
o Can easily extract links in a file and put them elsewhere – if permanent it can be a permanent landing page
o Need to be actively manage/testing theseSession: News from ISOLand
1) Helen Conover
2) Hook Hua
3) Ted Habermann
IsoLineage Metadata at AMSR-E SIPS – Helen Conover – GHRC DAAC/AMSR-E SIPS, University of Alabama in Huntsville
· Terms for talk
o dataset (ISO) = date file (individual science data file)
o Product = series (ISO) (collection of data files
· AMSR-E (Advanced Microwave Scanning Radiometer – Earth Observing System
· SIPS (Science Investigator Lead operating system)
· GHRC – does provenance (how did you get this, where did it come from, how can it be used – used to be called processing history) and add metadata and QC to data
· Products = brightness temperatures, ocean products, monthly and daily ocean grids, sea ice concentration, snow depth, sea ice drift (typical NASA microwave suite)
· Capture the contextual knowledge
o Some is already there
o Recently – putting metadata into ISO lineage metadata model
§ Lineage so it can be added to full suite of ISO data
· Legacy data system (HDF-EOS2)
· Capture – which data products go into which for the different data projects (ex. Rain has rain and brightness temperatures)
o SIPS provides control script – does not include science
· ISO is complex (comprehensive) – need to make friends in community
o Only look at lineage
· Lineage Model
o Lineage – descript source, and processing (which does down to algorithm)
o LE = 19115-2 – they are an extension of the original model (LI) – to facilitate more detailed description of lineage
o DQ_DataQuality à LI_Lineage (quality of this product – i.e. what went into making it)
· XML and ISO are verbose way to ‘saying things’ – intending to attach it to the data file (increase size)
o Ended up with HDF SE attribute (with HDF-EOS) – this is an HDF4
· Lineage Granularity
o Lineage info is the same across all product – capture info for a unique file (when and where it was processed)
o At product or series level – capture attribute information
o Keep all lineage data in each file (2 elements in each file)
o XML “dataset” and “series”
· Lineage Model
o Where put information – first big job (then how to say)
o ProcessStep – high level processing description
o Algorithm – science algorithm name, version, author, description (high level info and pointers to real data)
o Delivered algorithm package might change but not always change the science algorithm
o DOI and specific descriptions in Source files
o These are done once per version of data
o Then automated process for each data file for processing date/time/ location/ input and output files
· Question – Echo data (cloud cover) – easy map from ECHO to ISO
o ECHO attribute or PSA – is that mapped in LE_Algorithm?
o Value would be somewhere else
o Ted – can have any number of processing steps (0..*) – can have separate ProcessStep or Algorithm
· Q – what level of granularity
o Tried to capture the science algorithms
o Ex. Sea ice – one processing executable, gridding, and 2 algorithm, and then snow depth
· Q – each file has many attributes
o In provenance system – capture the attributes – map the variables to each algorithm
o Not to the level of equation names (some have actual names and others are descriptions)
o Did not do mapping of variable to algorithm in ISO
· Versioning
o ESDT – doesn’t change often
o DAP (Delivered Algorithm Package)
o Trying to tie the processing algorithm version to the metadata
§ Includes what algorithm does, description, and author info
· DOIs
o NASA trying to figure out how to handle DOI at ISO level
o NASS ES difference between GCMD DIF and netCDF CF
o Decided to combine url and DOI and then text associate is the “doi”
· Q – can use anything not use DOI
o Yes – hence put DOI (but description is not part of Identifier)
· Use codeSpace to indicate NASA ESDIS as publisher
o What this to be part of the NASA flavor of ISO
o They are the authority for DOI
· Challenges – complicated, it is evolving, schemas have not been promptly provided, need to reach community consensus
· Need – NASA flavored schema, concrete examples, representations in other languages, communication (?online forum)
· Q (Aleksandar) – where get processing lineage before put into ISO
o Red on screen = online form, talk to producer, fill in form and store in database
o Blue = processed in house (file read events, software evocation) – parsed into database and then XML
· Q (Jennifer) – do you have a cheat sheet of summary
o Lots info the NOAA GEO-IDE wiki
o A lot of details – emailed works – then sorted through
o Need online resource (don’t have fully validated XML) – will be there in a month
o NASA is also developing their own wiki
A Practical Application Using ISO Metadata – Incorporating ISO Metadata into SMAP Data Products – Barry Weiss Hook Hua, Vance Haemmerle (JPL)
· SMAP – first NASA decadal mission
· Soil moisture
· 15 products L1 to L4 (parsed radar telemetry to carbon net ecosystem exchange)
o Trying to create ISO for different data products (large undertaking)
· Level 1 requirement for ISO metadata (required from the top)
o Using ISO because it is international, common representative (contextual model and encoding of it)
o Include tools, use cases
· ISO basic concepts
o Granule metadata = dataset
o Collection metadata = series metadata
o Codelist = enumerated list of accepted values
o Profile = community agreement of particular elements
o Extension = explicit modification (NASA is a flavor)
o EX = Extent, LE = lineage, CI = citation
· ISO geographic standards
o Using ISO 19130 – imagery sensor model
o Usually talking about 19139 encoding
· UML
o Progress Code – completed, historic obsoleted
o SMAP needs extension points – need to mark products as “beta, stage 1, stage 4” – what to augment in code list
o Do we need to standardize code list
§ 2 camps (1) 19139 XML (2) HDF5 group
§ Kept both
· From the Earth Science Data Model (ESDM) – all in HDF5 metadata group
o Create crosswalk between HDF5 and ISO groups
o Ex. Lineage would be subgroup in HDF5
o Lineage group include attitude, ephemeris, antenna pointing, …
o Renamed MI_Identifier as identified_product_doi for DOI
· Started with UML diagrams from ISO and expanded where needed for SMAP
o What were the extensions that were needed
o Then generated spreadsheet – provided mapping between ISO to HDF to ESDM
o ESDM defines gaps, ISO only beginning and ending
o Ex. Extent – needed to add a vertical extent
· Spreadsheet provided more than exercise – did the mapping programmatically
o Mapping used by converter to generate extra files for crosswalk
· For series metadata - Delivered by data architect
· For dataset – problem = automation
o Spreadsheets are the first step
o Use info to automatically inject the correct fields into the ISO from HDF
o Able to reduce dependencies to only HDF5 libraries – simplified things
· Saxon used for crosswalk (transforms needed for each flavor)
o Decoupling science software and metadata dialect
o That means if the dialect changes
· Q (Peter) - Has program to move from HDF to ISO
o Not writing software – writing rules (saxon)
o XSL – it is an open source tool (apache product)
o Kept as simple as possible
o Ted – also have transform from OpenDAP land to ISO and NC-ISO – translates from one XML to another
· Q - (Peter) – rules defines the fields? - yes
· Q – what about binary data
o HDF group – h5dump – ignores all data arrays (only dumps out metadata)
· DOI and UID
o MD_Identifier has been updated to be a formal class (changed) – identify if DOI
· SMAP extension
o Additional attributes (ex. Run time parameters)
§ Eos and echo additional attributes
§ Issue – couples the type and the values – need to repeat type definition throughout (sometimes doesn’t want to repeat)
o Only one citation for algorithm
· Validation
o XML data binding tools
o Ted suggested schematron approach – use rules (popular in ISO community)
§ Ex. Width needs to be followed by high
· Limitations in HD5 (1.8 library)
o In hierarchy – group names have to be unique (can’t represent arrays of groups)
§ But arrays are common in ISO
o H5dump – UDT (user defined data types)
§ Not fully supported – become text blogs
· NASA flavor recommendations
o Acquisition information – some belong to granule and some series
o Namespaces
· ISO is cutting edge… to NASA
· Lessons
o Easing into ISO
o ISO deeply nested
o Simplicity – ex. Only HDF5 (easier to use Matlab)
o Need flavor
· Q – how benefiting from international flavor
o 19000 series – cover geographic but not mission specific
o Flavor is a community agreement (not changing standard) – use same extension – these options
· Q (Erin) – how different than a profile
o Call it what you want (NASA likes flavor)
o Flavor – is a code list (Erin)
· Instrument, platform, processing – ISO revised every 5 years – implement now as extension and then add to discussion for future (community process of extensions)
· Q – will the standards have evolved in time for SMAP mission (currently using previous version)
o Ted – ISO has standard mechanism to extend itself
· Q (Alek) – how much larger
o It is in the noise (10-70 k) –
o Helen had 100 k files that only had 10k spot
Wikis, Rubrics, Views and Connections: An Integrated Approach to Improving Documentation – Ted Habermann, Anna Milan – NOAA/NESDIS/NGDC
· Tools are on top of web accessible folders
o Also use portal (external view)
· Here help people who are creating metadata to improve it to better understand connections
· NOAA wiki – NOAA EDM (old GEO IDE) http://geo-ide.noaa.gov/wiki [6]
· Wiki
o Discussion pages – include examples – have explanations – first things created on wiki
o ISO explorer – for class/element – structure/order/ alternatives – help people editing metadata
o Pages were created based on community input (based on questions to Ted)
o Training – approach to learning ISO – building blocks (structured paths through wiki content) – wiki more like encyclopedia… Ted uses them like books
· Wiki Navigation
o Categories – important – automatic to group pages (many-many & sub-categories)
§ Work like a home page
o ISO Explorer – has classes of FGDC (things need to be in the right order – not the same as the UML)
· Many of the pages are updated mainly by Anna and Ted, but other people too… it is an ongoing effort
· Web Accessible Folder
o Folders available from website
o People mange metadata in databases
o Web access folder are then like a cache – people can harvest
o Titles (with stars related to score), Links, Sources, last update, views (get data, FAQ, HTML, fields, comments, KML)
· HTML view – able to link to wiki from each of these views
· Metadata evaluation – rubric
o Mechanism for evaluation – here completeness of metadata
o 1) use attribute convention for data discovery (ACDD)
o 2) defined by Ted’s group
o Rubric made of spirals made of fields… linked to wiki – dynamic user guide
§ Red = bad, green = good – other information provided via urls (best practice)… … opportunities for improvement
o Each record has score… this is an evaluation tool
· Connections – community has lots of dialects (or metadata standards)
o ESIP wiki – documentation connections
o How to document difference connects (ex. People – provide different dialects xpaths) – if you know more > talk to Ted
· Q (Hook) – is this a NOAA manage/operated or community
o Ted controls who can contribute
· Q – we want to control/understand what document is being referred to in metadata – references in documents may include URLs – do you see a way to control obsolete data in a rubric
o maintenance of links in metadata record
o tools sit on web folders that check links
o also – prefer xlink and then links controlled elsewhere
o use something similar to link checking websites – work with series
o recommend not using link in granule
o Use resolvers (doi:)
· Q - can the rubric provide guidance
o Guidance but not control
· Q – DOI landing page
o When someone resolves DOI it goes to that page – can be created in metadata
· Q – can landing page provide permanent link
o Can easily extract links in a file and put them elsewhere – if permanent it can be a permanent landing page
o Need to be actively manage/testing these
Kelly Monteleone, Sarah Ramdeen, Benjamin White, HElen Conover, Ed Seiler, Amanda Orin, Reter Cornillon, John Moses, Gregg Foti, Karl Benedict, Mathew Biddle, John Scialdone, Nancy Hoebelhernrich, Sarah O'Connor, Ed Armstrong, Jeff Lee, Aleksandar Jelenak, Brian Wilson, Hook Hua, Ted Habermann, Erin Robinson, Jennifer Davis, Don Collins, Wenli Yang, Steven Alenbach, +2