Documenting Data in Multiple Dialects
ESIP partners use data in many in formats and across many communities. These communities have different documentation needs and address those needs using different standards and conventions. ESIP exists at the crossroads of these communities, a natural "marketplace" where many dialects can be heard. It is a perfect place to identify needs shared across multiple communities and build understanding of how those needs are addressed. This understanding, in turn, facilitates sharing of documentation and the data they describe across communities.
This workshop will address documentation dialects used across ESIP. Candidates include ISO, FGDC, DIF, CF, ACDD, ECHO, and EML. Members of the community that can represent these dialects in the discussion are needed.
Agenda
CF Standard Names for Satellite Observations and Products, Aleksandar Jelenak and Ed Armstrong
Aleksandar and Ed proposed several new standard names for CF list. Aleksandar's names were motivated by the Global Space-based Inter-Calibration System (GSICS, http://gsics.wmo.int/) Project NetCDF convention (https://gsics.nesdis.noaa.gov/wiki/Development/NetcdfConvention) and are described on the ESIP Wiki (http://wiki.esipfed.org/index.php/Standard_Names_For_Satellite_Observations). Ed proposed spectral coordinate variables called wavelength and wavenumber that would be used for spectral bands included in netCDF files.
Metadata-Centric Discovery Service– Thomas Huang and Ed Armstrong
Thomas described the JPL Data Discovery Service that uses a single metadata repository and supports multiple metadata dialects for users. This is done using a series of templates that hold tokens that indicate where particular content appears in different dialects. This approach provided a theme for the rest of the session.
NcISO: Translation from netCDF/THREDDS to ISO, Ted Habermann
Ted described the https://geo-ide.noaa.gov/wiki/index.php?title=NcISO tool that he and Dave Neufeld developed st NGDC. This tool translates metadata from THREDDS, NcML, and netCDF into ISO. It provides these services as part of the THREDDS Data Server http://www.unidata.ucar.edu/software/tds/.
GCMD: DIF to ISO translation and CSW Support, Scott Ritz
Scott described how GCMD (http://gcmd.nasa.gov/) is translating records from DIF to ISO in order to take advantage of the Catalog Services for the Web (http://www.opengeospatial.org/standards/cat) capabilities of Geonetwork (http://geonetwork-opensource.org/). He also outlined future directions for GCMD support of the ISO standards.
ESRI Geoportal XML Indexing, Christine White
Christine described the approach that ESRI Geoportal (http://geoportal.sourceforge.net/) uses to support multiple metadata dialects. She described several XML files that are used to provide this support and the connections between them.
Developing and Sharing Community Metadata Crosswalks, Ted Habermann
Ted concluded the session by pointing out that all of the systems described during the session relied on crosswalks between metadata dialects and proposing an XML representation for holding information about these crosswalks. He proposed that the Documentation Cluster can work together to create a community consensus about these crosswalks.
CF Standard Names for satellites - Alexander Jelenak
- Cf = climate forecast
- http://wiki.esipfed.org/index.php/Standard_Names_For_Satellite_Observations
-
Assign variable standard name to reduce confusion
- Do not include much about satellite data
- Works at GSISC
- Instrument = sensor
- Channel = band
- Willing to accept suggestions
- Canonical = physical units
- Definition = text description
- Names on wiki http://wiki.esipfed.org/index.php/Standard_Names_For_Satellite_Observations
-
sent terms to CF and received feedback
- coordinate names must be numeric – thus datetime_iso8601 cannot be a coordinate variable (need to be monotonically increasing)
- satellite = platform
- if too specific or too small a community the standard name will not be accepted
-
List stops at toa_brightness_...
- because they include methods (which is not allowed) and are too specific to satellite
- last time CF list went off topic and the terms did not make it in
Questions
-
(Tom Whittaker) – like standard names specific to “our” domain
- Issue – though easy to eliminate problem with band/channel
- Also – CF convention is coming back to life – encourage this community
-
around 2000 ECHO metadata differentiated between instrument & sensor
- Do we need to make distinction?
- OMPS example – instrument with several sensors, and several instruments on a platform
- Sensor ML = sensor & detector, platform carries sensor
-
Is there any consensus
- NO!
- Traditional (2000) – platform, instrument, and sensor are different
- Now – sensor series – ISO metadata – sensor then series * something in 8000s
- Need names at all 3 levels (Ted)
- May not need that many (Debora)
- Why not leverage existing work – that defines data model and catalogues – standards (Donald)
- Use “*” to indicate terms that will not be submitted/discussed for submission to CF
- The discussion has been moved
ACTION
- discussion will continue on wiki – need to separate out terms per discussion
- Alex will identify which terms are not being submitted
CF extensions for satellite data – Ed Armstrong
- Works on Group for higher resolution SST Project
- Can download from NoDAC or NRDC
- Level 1 – band measurements or brightness measurements
-
Identified 5 issues
- Spectral bands
- Data quality
- Level 3 – specific – local/solar time
- Level 2 – specific – orbital
- Level 1 – specific – origin
- Identifying “spectral coordinate” – similar to time
-
New terms
- Center_wavelength
- Center_wavenumber
- CDL example – blue dimension, red coordinate, includes level 2, ni = across distance, nj = along distance
Questions
-
This may work for micro but what about polarization – in microwaves – just wave, is not applicable (Alex)
- Need another dimension or a string field
-
Centeral_wavenumber – because other wavelength
- This is not enough – need a range, but can add attribute to data
- In ISO have min, max, but no center
-
If band identifier – string or numeric
- Or dimension variable = string
-
2 issues
- What called
- Variable attribute (ex. Min, max, polarization)
-
This may be overkill to include band wavelength width (Alex)
- Want spectral response function
- Central more than enough
ACTION – need to continue to talk about these terms
Remember – if you add anything to the wiki – include [[Category: Documentation Cluster]]
Metadata-Centric Discovery Service – Thomas Huang, Ed Armstrong, Noja Ching (Po.DAAC/JPL)
- There is no one-size fits all standard à develop service architecture
- File with no name is useless, more information increase its use
- Metadata is data bout data or context)
- Need help find relevant data – what needs to be packaged and how
-
If known/understand data – use FTP crawling
- Have metadata packing in various formats
- Need aggregator service and deliver metadata in form people understand
-
Data in oracle – need sql statements – need understand metadata and data
- Export to enterprise search engine (thus no down time)
- There is not index oracle table needed
- Aggregate and metadata translation – can package data
- Template engine – define template – “what field map to” – very flexible
-
http://podaac.jpl.nasa.gov/
- Can pull remote and reposition – this is a hub for translation and discovery
- Now have subscription service
- Upload data daily – reindex ~ 15min
- Deployed in 2009 – focused on GHRSST – can now map to ISO
- ISO-19115-2 – model to leverage objects – pull from dataset descriptions
- Metadata for metadata and granule
-
Each standard has strength – can I pick and choose/customize metadata
- Can use Open search to produce own metadata
-
Challenges – find missing info – need help from provider
- Ex ISO export -> python script
- Can use open source search
- Can’t use excel for GMCD
- For FGDC - take long time before – now only a few seconds
- Working on quality, missing attributes
- OCSI – Oceanographic common search interface
Questions
-
How much larger is internal represention
- Do not have to change data model – look into different models earlier on
- Might have to next time
-
Wants attrib, units, and open DAP, CF and server help narrow search (Dave – USGS)
- This is the next phase of the research – ability to narrow search
- Great use case – also need to feed facetted semantic search, ex vol units – this is the sweet spot (Debora)
- Need concusses
GCMD: DIF to ISO translation and CSW Support – Scott Ritze (WYLE Information systems)
- From metadata manger perspective & quality control
- DIF = directory interchange format
- Convert with SLT
- Do not hold granules
- Converted once a day
- ISO is indexed in IDN-CSW server
-
Have production, failover, and testing servers
- Testing is only updated one a week
- Email for account/access
- CSW patterns – GEOS/CI – why funded, also CEOS/CWIC, EURO-GEOSS, Climate.gov
- How support ISO
-
Replace DIF with ISO in GCMD – this is not wanted or a viable solution
- Includes language fields
-
DIF-ISO profile – lose some information with XSLT
- GCMD stores al field
-
“key-value pair” – imbed non-DIF or ISO (not yet started)
- Have to define properties
- Working on white paper (will put on wiki)
- In new version – every object in GCMD has uuid and keyword – now track keyword
-
GCMD tool – key-value
- Give field name and value
-
How implement the import into geonetwork
- ISO index in CSW software GEOSS – maybe wrote some java
-
Can you comment about arbitrary fields – “makes me feel uneasy” Ken
- Still have required fields – not unusual
- Can be a lazy step because people don’t check standard
- Problem – undifferential because they don’t know what they are
- ISO – record and record type – implement specific metadata
BREAK
Documentation from NCML to ISO (Ted Habermann)
- NcML is xml of Net CDF (part of java)
- ISO is standard with higher capabilities – broad
- Can extract from it to other standards
- Open Provence and PROV – Iso can point to more detail
- Iso can extent itself – implementation and cite
- à want to improve discovery tools to read ISO
- These are all dialects with “similar ideas”
-
THREDDS adds metadata in non-invasive way – had precedence over file
- Usually have a number of hdf (netCDF)
- OpenDAP à described by catalogue
- Extract DTAA in WM, WFS, OpenDAP
- Now extract metadata in multiple and in NcML, Iso, Rubric, using NcISO
-
NCML – 2 data and 3 metadata services for THD
- It is an attribute with name and value, convention = name
- if named same can covert to ISO
-
ACDD – mechanism for evaluating using “Ted rules” to evaluate
- Can make up own xsl
-
NOAA data management net wiki ncISO and ncISO google group for more information
- Email Ted if want access to google group
- Starting to provide access to granules
Questions
- How long rubric been in THREEDS – since beginning
-
Q (Matt) – does it maintain a log of elements not translated
- No – know what elements are mapped
- Metadata can have 100+ elements – no standard or map
- Can have useful model attributes in model – not always in right spot – so the user has to go back to find it
-
Alex – you can modify “slightly” net CDF attribute
- Can do something like last example
-
(Ted) – if you have ACCD element but no compliant names then make discovery consistent
- Not lost in translation à they are not complaint with conversions
- Rubric addresses some missing data
Geoportal XML Indexing – Christine White - ESRI
- http://gptogc.esri.com/geoportal/catalog/main/home.page
-
use Lucene, can use solar à in Apache
- you have concept (data, title, …) determined by organization
- tell Lucene they important
- Tell Luene where to look
- Location varies with metadata – look @ mapping files
- 1) customizable but 2) follow standards
-
Portal à where everything is stored
- Upload XML – if not xml, it will be rejected
-
Calls scheme.xml (choose what support)
- Start at top – “do you match this”
- Until finds match
- Then validate by those rules
- If “no” = FAIL
- Definition.xml – validate
- Saved, approved,
- Indexibles.xml – map xml x-path to concepts
- Then users can search results and can retrieve raw metadata
-
Schema.sml – lists types supported (Ex. Dc, FDGC)
- Want more specific standard list first because then work through list
- Each file is integrated – provide xsd and indexible file
-
Indexible
- Concept – keywords, subject – terms to search on
- Property_mean – defined concepts for Lucene
- Just because you support concept for a search does not mean anyone else is
-
Rest/index/stats/fields
- Show you word index and number of doc with that word
-
Q – Can you configure statistics/add parameters of statist
- Not planned to – but likely change items at top of page
-
Q (Mike) – index several stands – can specific search by standard –
- must be implemented by organization of geoportal – put name spaces
-
Q – if have synonyms can that be included
- Yes – can connect ontology service
- Just need to hook-up information in wiki
-
Q - This is 2D – what required for 3D spatial
- That is a Lucene question
-
Geoportal determine bounding box
- Need Z and what within intersect
-
Q (Mike) – attributes index – is there a percent of completeness within the database
- It will not find something if empty
- Provides information of paps in metadata and mis-spelled and inconsistence
-
Q (Dave) – apISO – look like ISO quieriable and when put in
- Will connect with right person – but ~ 3 or 4 years ago
Developing and Sharing Community Metadata Crosswalk – Ted Habermann
- Other papers are all same thing… people translating metadata dialects
- Want ESIP consensus of how translate
-
Once know fields that map à guarantee 100% translation
- Want to know what fields do not translate
-
ISOland – standards for instrument come up for review this year (every 5 years)
- As revise standard – extensions become standards
- CF is a community standard – community drive ISO
-
Then how express relation between these dialect
- ESRI has indexible
- NASA – spread sheet
- Ted – xml
- Rubric = analysis of completeness based on spiral
-
Spiral = group of related fields – use case
- Made of item -> represented in multiple dialects
- Dialects have title (human) and code (computer)
-
Spiral – title, code, description and x-path
- Meaning code works as anchor in html
- Has multiple xsl to convert to different views (table, html, graph)
-
How work as community to agree on this
- some easy (ex. Title)
- but quality, lineage, spatial reorientation à differences
- some are easy – other will be impossible
-
Q (Christine) – what is here is document of documents to tell what concepts are available in each metadata standard
- Like Rosetta Stone for metadata
-
Q (Dave) – get enterprise architecture and make this in uml
- Foster international standard ex. Uml
- Most standard do not have uml
-
Q – ECHO out of ECS, want include à need xml of ecs, but it is included
- ISO short with crs à point to crs
-
Do we want implementation and create url
- Now making connect implement of same concept
- Not sure why url is helpful
-
This is an opportunity for ontology
- How map between ontology
- Annotate xml… or smls change to html
- Want xml because you can change between xml or index
- Called alignment in ontology – map standards with ontology
-
What next step
-
DataONE, eml, FGDC, dryad (dublic core)
- Add these in
- Use wiki to come to concessus
-
DataONE, eml, FGDC, dryad (dublic core)
-
Need help with CF community
- Tools based on NetCDF3 – not support groups (all tools will break – need/probably development
- Telecons – might pick some topics per meeting
- Next meeting Wednesday, Aug 15 at noon EDT
Alex will update which names to be submitted to CF
conversion will continue on wiki - link to Documentation Cluster if you add any information