Pathways for Discovering Earth Science Data and Services using NASA's Global Change Master Directory's (GCMD)

Abstract/Agenda: 

The Global Change Master Directory (GCMD) provides discovery/collection-level metadata of Earth science resources and offers scientists a comprehensive and high quality database to reduce overall expenditures for scientific data collection and dissemination. The NextGen website offers advanced search refinement capabilities. Users can navigate the GCMD keyword taxonomy through a "tree" structure and perform free text, spatial, and temporal searches within the same query. In addition to the "human" interfaces for search and discovery, the GCMD also offers standardized "machine" interfaces including a Metadata Web Service (MWS), Keyword Management Service (KMS), OpenSearch, and RSS.

 

The Metadata Web Service is a RESTful service for retrieving and publishing Earth science resources including data set descriptions, service descriptions and ancillary descriptions. The Keyword Management Service (KMS) is a RESTful web service for maintaining keywords (science keywords, platforms, instruments, data centers, locations, projects, services, resolution, etc.) in the GCMD system. The KMS allows access to the keywords as SKOS Concepts (RDF) or as XML objects. OpenSearch is a web service for publishing results in an OpenSearch standard response suitable for syndication and aggregation. This feature allows aggregators to combine searches from multiple search engines. The RSS/ATOM feed allows users to subscribe to new and updated metadata collections in the GCMD. Users can also subscribe to a feed based on a specific search criteria.

 

In this workshop, GCMD staff will provide an overview of the NextGen website search features and give demonstrations on the how to utilize the various web services GCMD offers for enhancing discovery of Earth science data and services.

 

Presentation Outline

  1. Introduction to the GCMD
    • Mission
    • DIF, SERF, CD Metadata
    • Collaborations
    • docBUILDER
  2. Pathway 1: NextGen Website Search (Human Interfaces)
    • NextGen Keyword Search and Refinement
    • Free-Text Search
    • Portal Search
  3. Pathway 2: Web Service APIs (Machine Interfaces)
    • Keyword Management Service
    • Metadata Web Service
    • Open Search
    • RSS
  4. Demonstration
  5. Discussion
Notes: 

Ted Haberman – introduction

·         Documentation cluster noticed that some session doing documentation cluster issues… so there will be a tread in the New Hampshire room about documentation

·         Strategic plan session is after lunch on Friday

 

Tyler Steven

·         Website Keyword (control key words), free-text search (open key words), portals

·         Web API – Keyword Management Services (KMS), Metadata Web Service (MWS), Open Search, RSS/ATOM feed

o   Most of these come from meeting with different groups and their recommendations

o   If enough people ask for a it (ex. Service) then get permission to build it

·         Last year Thomas and Adam talked about the web services – they were new

·         Metadata formats

o   When data imported into the system it is in DIF, but can export any format

·         Science and Service Keywords

o   Recognize the importance of key words and free text

o   5 level keyword hierarchy

o   http://gcmd.nasa.gov/learn/keywords.html

·         docBuilder

o   online metadata editing tool

o   recommend as many fields as appropriate

o   http://gcmd.nasa.gov/collaborate/docbuilder

o   Can see what is completed based on check marks – very useful

o   Each record is validated before it is incorporated – go back to provider to fix information – helpful with maintaining single records

Scott Ritz

·         NextGen user interface

o   Looked into the best way to provide search interface to the users

o   Website was released in Spring 2013

o   Choose – dataset, services/tool/ancillary description searches

o   Key word search interface section incorporates legacy search

·         Search refinement

o   Click on record number to get to this page

o   Dynamically updated search results based on left side menu bar

o   Now more dynamic/easier to use with this new interface

o   Can also be done for spatial and temporal search

·         Search Ancillary Descriptions

o   New search options

o   Descriptions of platforms, project, and service providers

·         Portals - Virtual snapshots of the services in the directory (ex. ESIP, EOS)

·         Q – if we have a suggestion of how to get something changed, do we submit to interoperability forum

o   Submit to the user forum

·         Q – is there a forum for non-idn discussion

o   Thomas will show one

·         Q (Ken) – customizable portals – can you give more details

o   Takes 5 min to modify config file for basic search criteria

o   Q – are they java script or JSON handling responses

§  Hacked how server uses JSPs – “protalization” function

o   Q - What is the most complicated?

§  Showed more data than GCMD portal does – created to show mpegs

o   Q – did that take a few months

§  Communication takes more time

·         Q – do you think any use for a reference client in open source so someone can house it on their website, a quick way to set something up – built their own portal

o   Portal development has to coincide with software development

o   Hosting outside of GCMD software would require more discussion, but is possible

·         Q (Ken) – are there any response format that user can request it from

o   Yes – Thomas will show – ex. Can do csv – can get the identifiers – for csv, most want to compare different database

Thomas Cherry

·         MWS overview

o   This is a quick and dirty interface

o   Could build a client better than docBuilder using mws interface

·         Start with connect page http://gcmd.gsfc.nasa.gov/Connect/

o   Need password – but can have short use account (ex. ESIP Winter)

·         Still using secret parameters… but some will be out next week – able to get pdf to describe each interface (http://gcmd.gsfc.nasa.gov/Connect/docs/mws/MetadataWebServiceAPI.pdf)

·         New features since last talk

o   Ticketing system – you can query where your ticket is in the process

o   POST Validator – demonstration

§  Called curl to get zzz415

§  Then feeds back to get validator

o   On website have cURL, Java, and Objective-c

·         Keyword Management System (KMS)

o   Needed something in-house for managing science keywords

o   More than just key words – platforms, instruments, researchers, projects, data center names, chrono units, ….

o   Top object is scheme

§  Inside scheme is science keywords

§  Ten branches out to what is being described

o   Developed like a typical SKOS (simple knowledge organization system) only one broader relationship per concept…

o   3 choices – scheme, concept, or searches

o   Can now just put an extension on the end of the url and get it in specified format

o   MD5 and last modified date – are upcoming

·         Static interface

o   Most people don’t want to log in – just want to search by keyword

o   Did a kitchen sink approach – most of csv, xml, rdf…  - tar.gz and MD5, and one historical copy before overwriting a file

o   Similar to the RESTFUL interface, but static

o   Compressed files are ready to go – PRO

o   CON – many of the files reference dynamic KMS – looking into how to have more self-sufficient exports (more human readable)

o   CON – no search functionality

·         Keyword Search (web)

o   RDF/ATOM feeds are all over

o   Includes RSS feed that will update – also has old global RSS feeds

§  Global RSS feeds are being updated in ATOM and RDF

o   Opensearch – how firefox was able to add location as a search site

o   Q – do you have ATOM response

§  Yes… example

·         Q if support granule open search query is there a way to support?

o   Can do this à would be interested in this

o   Want to chain open search query

§  Working on this in QUICK

·         Gcmd.nasa/gov/r/report – goes directly into bug tracking ticketing system

·         Q (Doug) –controlled vocab vs text search, in ECHO have done a query.  Have you gathered metrics along these lines

o   But results are skewed because they promote type word

o   ACTION – send Doug report next month

o   Also have ancillary key words – feels like a free text

o   A lot of people navigate website using science key word – drill down through the tree

o   Words in keywords are added by user community – get new key words requests

·         Concerned more about why the key words are not being used

·         Q – curious about the location key words – search GCMD and haven’t seen definitions for the keywords – just searching, some datasets the researcher adds their own key words, ex. Drainage basin name

o   15-18% use key words for search refinement

o   Possibly seeing location key words that are not in the taxonomy because they are in detailed location which are not in the formal key word list

o   Q – there are formal definitions for key words

§  Not for location key words

·         Q (Jeff) – searched on data provider for NOAA – there is stuff in there from NOAA that was registered a long time ago and isn’t being updated.  Who would he talked regarding issues cleaning up the NOAA entries

o   Going to flush out all NOAA records and reload

o   Just wiped all NDGC records and are reading them

o   From a software – need to key in more code – to actively notify people (meta - author), want a dashboard to see records that have future revision dates à this is in the process

o   Jeff D. – would like NOAA data provider create metadata and then have a conduit that “you” would talk to

§  Do that… it is more automated 

Actions: 

send Doug report next month on metrics about keywords vs text search

Attachments/Presentations: 
Citation:
Stevens, T.; Ritz, S.; Cherry, T.; Pathways for Discovering Earth Science Data and Services using NASA's Global Change Master Directory's (GCMD); Winter Meeting 2014. ESIP Commons , November 2013