NASA EOSDIS Evolving Technologies Discussion
Earthdata Search and Metadata Quality (Dan Pilone)
Common Metadata Repository Sub-second Search (Jason Gilman)
Unified Metadata Model Status and Discussion (Katie Baynes)
Earthdata Standards Office and the Lifecycle Process (Yonsook Enloe)
Earthdata Code Collaborative Status and Discussion (Brett McLaughlin)
Next Generation Application Platform Introduction and Overview (Justin Molineaux)
NASA EOSDIS
Earth data search client – Dan Pilone
· Typical user surface temperature – get 982 datasets … get and finds out there is a cloud
· Try again – better choices
· Now visualized – search – allows visually see what you are selecting – now can special subset
o Does this work – Yes – as long as your browse imagery in gibs, know that opendap is available for sub setting, and have high quality metadata
· Metadata – fill in as much as possible, can augment metadata, but give us as much as possible
o Provide visual metadata – can provide urls to your images
o GIBS can provide a high performance tile service
o Services – difference between 600 MB in archive format or 10s of KB in csv that you can plot
o Accurately describe spatial areas – found some issues
· Create button – report metadata problem
· Q – Alan Doyle – When get search result – do I get any metadata, particularly for file format like csv. Because then want to publish something – a pre-populated bundle – yes –
o geojson for specifying boundaries – do support – shape, a zipped shape, kml, kmz…
· Q – Walt – there is a limit as to how much metadata can be provided – can you decrease the amount
Earthdata Code Collaborative (ECC)
· Internal development effort to standardize tools, etc. – to support testing and code reviews
· Available from website – need NASA log-in – have to get permission from someone at NASA (just have to show how your work is related)
· Get wiki space…like github… - full suite of web interactions and full tracking – have complete traceability – get all of this in the ECC
· Aimed at Agile Process
· That all works – could do that in an afternoon
· Deployment –
o Now can you put it somewhere for me/ where can I put my code
o Continuous integration – it is no longer a person that tests the code – a script will test the code. This should be run every time code is checked into the repository. Build agent will run test and run code.
· (Semi) continuous deployment
o With some confidence because of testing – can deploy
o Have 3 different testing location – with increased user access – controlled through Bamboo
· If sign up for ECC – source code repository, some level of automated tests, accessible deployment target hosts
· Recommendations – use everything that the UCC has to offer – use test harness
· http://ecc.earthdata.nasa.gov
Katie - UMM
· Unified metadata model or mapping
· GCMD and ECHO were differed now have to use ISO
· UMM is a crosswalk that will have a life cycle
· Why not just ISO 19115 – we are going towards this – there is a lot of legacy systems that will take time – make sure the providers can use UMM to look up and understand all the attributes/concept
· Crosswalk – Dif, extended Dif, ECHO, ISO 19115-2 (has and and with)
· Want minimal impact on existing data – deploy to dash 2 and will be in-line with dash 1. – can get any data currently (retrieved) in ISO
· UMM – Development – surveying existing implementation – defining and cross walking fields – UMM is being reviewed
· UMMg (granule) and UMMc (collect) – being reviewed
· Metametadata – tagging for events, …
ESO and thCMR Life Cycle Process – Yonsook Enloe, Allan Dolye Conover
· ESDIS Standards /Office (ESO)
· Using stakeholders input of the CMR –
· If you have ideas of changes – [email protected]
· CMR life cycle document – explains how change requests will be evaluated.
· Have just completed UMM-C and UMM-G completed review yesterday – UMM-S (service) and UMM-V (visualization) soon
· If have ideas for future potential standards & best practices – would like to hear from you
· Ted – in OGC just starting – small group talking about code development for standards development (also the same idea in ISO) – lots of conversion in the tool space to share
Jason – CMR Sub-second Search
· People want fast search (under a second), 100 million granules + …, 1.5 billion objects indexed in search,
· Ken – how the sub-second search requirements – Ken is ok with 5 seconds, where was it documented
o Jennie – it was a grassroots – Chris Linnus and Giovanni – the community – more for the machine level
o As started layering – then is there imagery in GIBS, are there services
· Why – ends up being many searchers. As user moves your mouse – how many granules below the mouse
· ECHO Search Flow – where was time being spent – fetching ACLS and respond things in a digital cash – large oracle bounding box could be slow, … could take 10-30 seconds
· CMR Component – micro service – simple
· Architectural Traits – what are the qualities that we wanted – 1) performance, 2) scalability, 3) correctness
o Immutability – data cannot change – metadata from provider – it can never change – if resend with edits – then new record – can be cached for ever – never need to invalidate or clear cache to have most recent data
§ This can become a feature on the API
o Functional programming
o Idempotence – result of an action, applied one or more times, is always the same
· CMR Search flow (in search application) – different paths have different performance benefits
o Depends on if it is granule formats
o Q how often do we refresh the cache with the data in database – it is continuously done
§ Takes 2 ½ days to reindex when they want to add a new field
· Partition and shared – can direct enquiry at a specific index
· 3 phase spatial search – plug in in elastic search with 2 searches against the elastic search data
o Bounding rectangle – keep in memory in elastic – Hardware really helps
· Q what oracle version - currently in 11 – do you pay extra – yes
· Q optimizing low earth satellite data, any strategies between spatial temporal link – have not yet looked into it – not sure if we need to look into
· Q right now it is optional method use to feed CMR because there are several version (geotedic, Cartesian, other) – if have data recommend then recommend send it up if have it in the metadata – think would get an amazing benefit from trying to optimize it
· Q – SSD – that is what using – does it make a difference and did testing – if can keep everything in memory, but still need fast to upload new data
Justin – Shipping Science with Containers: a next-generation application platform
· At EOSDIS hope many applications – try to use the best tool for each job
· Currently use VM very heavily – hypervisor exposes virtual hardware to the machine – start to use the resources of the host – looking for a way to reclaim resources of the host
· Containers – repopularized by Docker – still have Host OS – but Docker Engine exposes container to virtual environments
· Workflow – CoreOS might be an operating system to help cluster environment – allows placement of containers
o On top of CoreOs is Docker
o Then use heroku build pack to identify code and determines the best environment
o Already using Git for source code management – so deploy with git
o Someone has already optimized – DEIS is an open source program
§ If one container fails then the others can pick up the load
· Can direct toward public or private nodes/clouds
· Docker is an interesting technology – makes standard way of defining container
· Q – what applications are you going to try first – the list is changing – Earth Data Search – have on a private AWS account – try different use cases – data album - URS could be another application