Attribute Convention for Data Discovery: Present and Future
The Documentation Cluster has recently finished revising the Attribute Convention for Data Discovery (see the approved document). Future developments will focus on using groups to provide more discovery details and metadata for other important use cases (access, use, and understanding). An approach to encoding ISO metadata in NcML and HDF will be described for discussion.
ACDD
Ted – First business meeting of documentation cluster – made a strategic plan (on commons and wiki)
John Graybeal & Anna Milan – ACDD Progress and Lessons
· In 1.0 it describing a data set to discovery systems
· In 1.1 allow dataset discovery and facilitate mapping between dataset metadata and ISO 19115
· Now also incorporates user use of metadata
· History – started with unidata UDDC (unidata data discovery session) in 2005. In 2010, NOAA created a v. 1.1. In 2013, started a 1.2 version that was never released. Now in 2015 released v. 1.3
· Best part is what didn’t change – things like creator or publisher – might have wanted to change things that would have eliminated issues, wanted everything to be backwards compatiable.
· Wanted to fix
o More usable and straight-forwards to work with
§ Lots of introductory materials
§ Guidance
§ Defintiosn – longer, clearer, less ambiguous – more self-contained – less external references
§ Examples for most attributes
§ Simipler presentation - * for issues or concerns
§ Alphabetical index to term
o Usefulness
§ Clarify if CF or NetCDF attribute
§ Add text to clarify how it should be used in ACDD
§ In the past implied creator was a person – sometimes creator or publisher is someone other than a person – can use creator or publisher type.
§ Added attributes – geospatial_bounds_*_crs, platform, instrument, program, creator/publisher_institution
§ Did not go as far as adding ISO level roles – it would have been hard and the 2.0 version will likely use the 2.0 approach
o Computability or interoperable
§ The ACDD convention will always point to the current convention
§ Convention attribute is clear which version you are using
§ Added structured concepts – can specify what you mean by attributes and their values
· Some things lost
o Reference information to other standards, detailed examples, “Additional Materials” – for the attributes that carry forward – the previous guide applies for most of the attributes
o Q from AJ – can you stress the conformance test – if you use 1.3 convention, not sure of anything that breaks the 1.1 conformance checking but would not
o Anna – do not currently have checkers for 1.3. Ed’s tool uses 1.1
· Lessons
o Lots of attention to detail – keeping everyone up to date was very complicated
o Problem – people didn’t have the time
o Use doodle polls to move forward – see how many people were interested in an approach – look for commonality
o Had to force a stop
o Advice: have to have a core set of people stay reasonably focused
o Still learning ACDD history – may have reinvented a wheel or 2
o Missed a trick or two getting timely engagement – the early work was reversed later
o Some opted out based on some decisions
o Significant split regard “backward compatible” – Open Question if a version redefines an attribute, how bad is that?
§ Is that terrible or is it fine because it is different versions
o As of September started to have more people engaged
o Still some definition issues – Open Question: If a definition is circular, how bad is that? – back to the issue of changing means of attribute
§ Not sure it was terrible that ended up leaving circular definitions
· In the past – things changed possible in real time… someone changed it online and then it was changed – don’t know about reversions
o V1.2 were “off-line” didn’t affect 1.1 – really just the attributes definitions
o In 2014 – built a full proposed document… then more people got involved
o Then reversion occurred … so called in 1.3 – maybe because it was a “final” document
· Why use new ACDD
o Easier to read and learn
o Persistent URI – better interoperability
o Richer features
o Up-to-date recommendations – highest priorities
o Use all “highly recommended”, most “recommended”, and “suggested” if helpful
· Lots of people to thanks & voters
· Ed – at the end of December. Product version in 1.3 – don’t see it on the commons but it is on the wiki
· Kelly - more momentum because of using the Documentation cluster telecom and people were around
Ted
· 20150108 – this is a date
o But what happened – 2 ways to tell
o 1) give specific names = hard typing
o 2) give things types = soft typing
§ Hard – date_created with value “20150108” or attribute
§ Soft – date = “” and type = “creation” (about 10-12 lines in xml – with date and type of date)
§ Hard typing is hard – have to agree on everything
§ Soft – still require agree on names, but have more flexibility – can meaning without changing the standard
o Hard is attributes – soft is groups
· CF 2.0 will not be backwards compatible with CF1.0 – this will include groups
· Grouping is more important in metadata than in data
· Standard is NcML for ACDD – no one is trying to change an attribute to a property
o What we discuss is convention – so it is how we name things
· Soft – in ISO – it is much richer collection of standard names – elements have names
o The flexibility of naming in ISO is implemented using code list – date/type code and a list of possibilities
o The other example is the role code that people in organization can play
· What the CF users and implementation – use file/directory structure to group things. Does it happen in the file system or does it happen in the file
· HDF is an object organization system in a file –don’t need boundary between file and directory – everyone uses on OS
· When thinking about ACDD and how it might work in a world with groups
o 1st attempt is on the ESIP wiki
o Includes what an “onlineResource” group – in NcML … date, citations… included problem with People and keywords
o This is names according to Ted… but it is like ACDD
· Alternative to Ted’s original
o Iso has roles and types – roles start with lower case letters and have name spaces
o Then there are types of things – 2 capital letter (bi-alphas AA)
o Alternates from role to type – each hierarchy level is either a role level or a type level
o People don’t like that it makes the XML longer
o It is very powerful and allows you to reuse content
· NCML – groups and attributes and each have XML attributes
· GOAL – take ISO roles and types and create rules NCML group and attributes and XML attributes
· ISO object becomes groups
o ISO objects become NC:groups with @name = role
o <nc:group name = “mdb:dateInfo”>
o Then need role and type – 2 standard attribute
o One of the things XML can do that file directory is have 2 files with the same names
o In repeating roles – but “_” and something unique (generate ID function in XSL)
· Q does this work only in the context of ISO – only in ISO and maybe other things – rules (19139) – from uml or xml
· Codelist – have 2 standard attributes in ISO – so need to be a group – group name = role
· There may be some problems in GML
· This is Ted’s proposal as to where we might go
· Have ISO representation in ACDD – can take ACDD example and can transform it into ISO in NcISO transform and then transform into ISO compliant NcML
· HDF Product designer – product designer can import NcML and create the structure Ted presented in its metadata and create an HDF5 file
· What happens when someone wants a principle investigator into their things and it doesn’t exist in ACDD. How about a description and title for url or use other keywords
o If agree on coding then there is no longer an issue
o Will put XSL on ESIP wiki… put in strategic plan for next year
· Q Alek – squishing roles and types – from ISO xml into group. Hierarchy levels do not translate 1-1 for group
o Create groups from object
o Will survive iso objects – yes gml – only 1 or 2 patterns that you need to match
· Q Ed – problem is using groups in new data model – how do you describe coordinate system.
o Iso has spatial representation object for coordinate systems – use the series element in iso
· Q Anna – have you looked at this in terms of NetCDF variable
o Described as content info in ISO
· Q John – have a process a relationship between NcML-ISO specification and ISO itself
o Ted thinks that it is true
o Able to go back and forth between the 2 versions
o Q – Relationship to ACDD – it is a vocabulary and prioritization process – defining them and prioritizing them. Terms choosing to define almost entirely overlapping with ISO
§ Need to normalize to ISO terms, prioritize
§ Then challenge – ISO term definitions are recursive are uninformative – this would be the ACDD added value
§ Ted – start thinking as ACDD as a profile of 19115-1 that brought to 19115-1 a set of community conventions – hence avoid agreeing about code-list
§ The ACDD profile of iso already existing - maybe 1.2
§ John – iso terms are a poor encapsulation of ACDD terms and concepts (Ted disagrees)
· Q AJ – with OCDD will it just be discovery metadata or will it expand
o ACDD is a profile of ISO and then expand that profile any way you wanted. Ex. Content and netcdf variables
o The wiki page approach is so limited
· OCDD is ACDD 2.0 – ACDD profile of ISO 19115-1 makes sense
· The large issue – a completed ACDD NcML file – a scientist says it is simple – then show them an ISO record… they don’t use it… need an interface
· Q John – with this tool – the chain – is that vocabularies in ISO that have creator and people that overlap with ACDD space – first thing you do is recognize that as the turn-space to work in. Hopefully just grab the appropriate terms and concepts. Decision is the last 20% that isn’t address by ISO concepts. ACDD is just which concepts are important
o Generic mapping of ISO to ACDD – conceptual model/challenge
· Strategic plan – for 2015 – to agenda for next Thursday at 2pm EST
· BEDI metadata topic (Ted) – under documentation connections
· Ed – need to engage NASA ECO folks – they seem to not work with us.
· Hack-a-thon – there is room for improvement – metadata office hours…
Telecon agenda:
1) 2015 strategic plan
2) BEDI metadata (Ted)