NASA EOSDIS Evolving Technologies Discussion

Abstract/Agenda: 
Earth Observing System Data and Information System (EOSDIS) continues its work on a number of different projects, systems, and initiatives. This session will provide updates and facilitate discussion on a number of these activities: 
 
 

Earthdata Search and Metadata Quality (Dan Pilone)

 
A host of new services are revolutionizing discovery, visualization, and access of NASA's Earth science data holdings. At the same time, web browsers have become far more capable and open source libraries have grown to take advantage of these capabilities. Earthdata Search is a web application which combines modern browser features with the latest Earthdata services from NASA to produce a cutting-edge search and access client with features far beyond what was possible only a couple of years ago. Earthdata Search provides data discovery through the Common Metadata Repository (CMR), which provides a high-speed REST API for searching across hundreds of millions of data granules using temporal, spatial, and other constraints. It produces data visualizations by combining CMR data with Global Imagery Browse Services (GIBS) image tiles. Earthdata Search renders its visualizations using custom plugins built on Leaflet.js, a lightweight mobile-friendly open source web mapping library. The client further features an SVG-based interactive timeline view of search results. For data access, Earthdata Search provides easy temporal and spatial subsetting as well as format conversion by making use of OPeNDAP. While the client hopes to drive adoption of these services and standards, it provides fallback behavior for working with data that has not yet adopted them. This allows the client to remain on the cutting-edge of service offerings while still boasting a catalog containing thousands of data collections. In this session, we will walk through Earthdata Search and explain how it incorporates these new technologies and service offerings.
 

Common Metadata Repository Sub-second Search (Jason Gilman)

 
The Common Metadata Repository (CMR) is the next generation Earth Science Metadata catalog for NASA’s Earth Observing data. It joins together the holdings from the EOS Clearing House (ECHO) and the Global Change Master Directory (GCMD), creating a unified, authoritative source for EOSDIS metadata. The CMR allows ingest in many different formats while providing consistent search behavior and retrieval in any supported format. Performance is a critical component of the CMR, ensuring improved data discovery and client interactivity. The CMR delivers sub-second search performance for any of the common query conditions (including spatial) across hundreds of millions of metadata granules. It also allows the addition of new metadata concepts such as visualizations, parameter metadata, and documentation.
 
The CMR's goals presented many challenges. This talk will describe the CMR architecture, design, and innovations that were made to achieve its goals. This includes:
 
* Architectural features like immutability and backpressure.
* Data management techniques such as caching and parallel loading that give big performance gains.
* Open Source and COTS tools like Elasticsearch search engine.
* Adoption of Clojure, a functional programming language for the Java Virtual Machine.
* Development of a custom spatial search plugin for Elasticsearch and why it was necessary.
* Introduction of a unified model for metadata that maps every supported metadata format to a consistent domain model.
 
 

Unified Metadata Model Status and Discussion (Katie Baynes)

 
EOSDIS Common Metadata Repository (CMR) will bring a fast, centralized catalog to NASA's Earth Science metadata. In addition to cleaning up and consolidating the traditional collection and granule metadata holdings, the CMR will be developing unified metadata profiles for several additional holdings, enabling collaboration and expanding services across all of NASA's DAACs. This talk will focus on the status of current metadata models for collections and granules and the process for working on new profiles.
 

Earthdata Standards Office and the Lifecycle Process (Yonsook Enloe)

 
Learn about the Earthdata Standards Office and the managed evolution of the CMR and the UMM profiles in this presentation.
 
 

Earthdata Code Collaborative Status and Discussion (Brett McLaughlin)

 
Find out about current status and updates to the Earthdata Code Collaborative (ECC).
 
 

Next Generation Application Platform Introduction and Overview (Justin Molineaux)

 
Scientific applications often present difficult web-hosting needs. Their compute- and data-intensive nature, as well as an increasing need for high-availability and distribution, combine to create a challenging set of hosting requirements.
In the past year, advancements in container-based virtualization and related tooling have offered new lightweight and flexible ways to accommodate diverse applications with all the isolation and portability benefits of traditional virtualization. This presentation will introduce and demonstrate an open-source, single-interface, Platform-as-a-Serivce (PaaS) that empowers application developers to seamlessly leverage geographically distributed, public and private compute resources to achieve highly-available, performant hosting for scientific applications.

 

Notes: 

NASA EOSDIS

 

Earth data search client – Dan Pilone

·         Typical user surface temperature – get 982 datasets … get and finds out there is a cloud

·         Try again – better choices

·         Now visualized – search – allows visually see what you are selecting – now can special subset

o   Does this work – Yes – as long as your browse imagery in gibs, know that opendap is available for sub setting, and have high quality metadata

·         Metadata – fill in as much as possible, can augment metadata, but give us as much as possible

o   Provide visual metadata – can provide urls to your images

o   GIBS can provide a high performance tile service

o   Services – difference between 600 MB in archive format or 10s of KB in csv that you can plot

o   Accurately describe spatial areas – found some issues

·         Create button – report metadata problem

·         Q – Alan Doyle – When get search result – do I get any metadata, particularly for file format like csv.  Because then want to publish something – a pre-populated bundle – yes –

o   geojson for specifying boundaries – do support – shape, a zipped shape, kml, kmz…

·         Q – Walt –  there is a limit as to how much metadata can be provided – can you decrease the amount

 

Earthdata Code Collaborative (ECC)

·         Internal development effort to standardize tools, etc. – to support testing and code reviews

·         Available from website – need NASA log-in – have to get permission from someone at NASA (just have to show how your work is related)

·         Get wiki space…like github…  - full suite of web interactions and full tracking – have complete traceability – get all of this in the ECC

·         Aimed at Agile Process

·         That all works – could do that in an afternoon

·         Deployment –

o   Now can you put it somewhere for me/ where can I put my code

o   Continuous integration – it is no longer a person that tests the code – a script will test the code.  This should be run every time code is checked into the repository.  Build agent will run test and run code.

·         (Semi) continuous deployment

o   With some confidence because of testing – can deploy

o   Have 3 different testing location – with increased user access – controlled through Bamboo

·         If sign up for ECC – source code repository, some level of automated tests, accessible deployment target hosts

·         Recommendations – use everything that the UCC has to offer – use test harness

·         http://ecc.earthdata.nasa.gov

 

Katie - UMM

·         Unified metadata model or mapping

·         GCMD and ECHO were differed now have to use ISO

·         UMM is a crosswalk that will have a life cycle

·         Why not just ISO 19115 – we are going towards this – there is a lot of legacy systems that will take time – make sure the providers can use UMM to look up and understand all the attributes/concept

·         Crosswalk – Dif, extended Dif, ECHO, ISO 19115-2 (has and and with)

·         Want minimal impact on existing data – deploy to dash 2 and will be in-line with dash 1.  – can get any data currently (retrieved) in ISO

·         UMM – Development – surveying existing implementation – defining and cross walking fields – UMM is being reviewed

·         UMMg (granule) and UMMc (collect) – being reviewed

·         Metametadata – tagging for events, …

 

ESO and thCMR Life Cycle Process – Yonsook Enloe, Allan Dolye Conover

·         ESDIS Standards /Office (ESO)

·         Using stakeholders input of the CMR –

·         If you have ideas of changes – [email protected]

·         CMR life cycle document – explains how change requests will be evaluated.

·         Have just completed UMM-C and UMM-G completed  review yesterday – UMM-S (service) and UMM-V (visualization) soon

·         If have ideas for future potential standards & best practices – would like to hear from you

·         Ted – in OGC just starting – small group talking about code development for standards development (also the same idea in ISO) – lots of conversion in the tool space to share

 

Jason – CMR Sub-second Search

·         People want fast search (under a second), 100 million granules + …, 1.5 billion objects indexed in search,

·         Ken – how the sub-second search requirements – Ken is ok with 5 seconds, where was it documented

o   Jennie – it was a grassroots – Chris Linnus and Giovanni – the community – more for the machine level

o   As started layering – then is there imagery in GIBS, are there services

·         Why – ends up being many searchers.  As user moves your mouse – how many granules below the mouse

·         ECHO Search Flow – where was time being spent – fetching ACLS and respond things in a digital cash – large oracle bounding box could be slow, … could take 10-30 seconds

·         CMR Component – micro service – simple

·         Architectural Traits – what are the qualities that we wanted – 1) performance, 2) scalability, 3) correctness

o   Immutability – data cannot change – metadata from provider – it can never change – if resend with edits – then new record – can be cached for ever – never need to invalidate or clear cache to have most recent data

§  This can become a feature on the API

o   Functional programming

o   Idempotence – result of an action, applied one or more times, is always the same

·         CMR Search flow (in search application) – different paths have different performance benefits

o   Depends on if it is granule formats

o   Q how often do we refresh the cache with the data in database – it is continuously done

§  Takes 2 ½ days to reindex when they want to add a new field

·         Partition and shared – can direct enquiry at a specific index 

·         3 phase spatial search – plug in in elastic search with 2 searches against the elastic search data

o   Bounding rectangle – keep in memory in elastic – Hardware really helps

·         Q what oracle version  - currently in 11 – do you pay extra – yes

·         Q optimizing low earth satellite data, any strategies between spatial temporal link – have not yet looked into it – not sure if we need to look into

·         Q right now it is optional method use to feed CMR because there are several version (geotedic, Cartesian, other) – if have data recommend then recommend send it up if have it in the metadata – think would get an amazing benefit from trying to optimize it

·         Q – SSD – that is what using – does it make a difference and did testing – if can keep everything in memory, but still need fast to upload new data

Justin – Shipping Science with Containers: a next-generation application platform

·         At EOSDIS hope many applications – try to use the best tool for each job

·         Currently use VM very heavily – hypervisor exposes virtual hardware to the machine – start to use the resources of the host – looking for a way to reclaim resources of the host

·         Containers – repopularized by Docker – still have Host OS – but Docker Engine exposes container to virtual environments

·         Workflow – CoreOS might be an operating system to help cluster environment – allows placement of containers

o   On top of CoreOs is Docker

o   Then use heroku build pack to identify code and determines the best environment

o   Already using Git for source code management – so deploy with git

o   Someone has already optimized – DEIS is an open source program

§  If one container fails then the others can pick up the load

·         Can direct toward public or private nodes/clouds

·         Docker is an interesting technology – makes standard way of defining container

·         Q – what applications are you going to try first – the list is changing – Earth Data Search – have on a private AWS account – try different use cases – data album  - URS could be another application

Citation:
Baynes, K.; NASA EOSDIS Evolving Technologies Discussion; Winter Meeting 2015. ESIP Commons , October 2014