Scientific Data Analysis on the Cloud

Abstract/Agenda: 

We recongize the fact our biggest stakeholder is our science communities.  This technical session focues on using the Cloud to conduct scientific data analysis.  Some of the focuse areas include

1. Technologies and solutions resuse

2. Reference architectures

3. Integration with multi-cloud environment

4. Leverage social networks and contribute back

Notes: 

NASA Earth Exchange (NEX): Community Engagement in the Cloud through OpenNEX
Petr Votava, NASA Ames/CSUMB

Overview of NEX

Discussion on Expanding to the Cloud

The Setup:
-NASA-Amazon Space Act Agreement: 1-year experiment through Nov 2014

Testing and Feedback
-Series of contests around data and services
-NASA Prizes and Challenges Program
--Start w/SpaceApp Challenge
-Improvement

Final Product
-Learned from past lessons
-OpenNEX Virtual Workshop and Challenge 2014
-https://nex.nasa.gov/OpenNEX

The OpenNex Challenge
-Run in collaboration with Innocentive
-Part 1: Ideation (Idea Generation)
-Part 2: Implementation on AWS

What's next: extending SAA and beyond
-Looking to engage other providers
-Work w/NASA on other ways to produce services beyond SAA

What's next: Elastic capacity
-Testing of expanding HPC architecture into the cloud

What's next: Visualization and Analytics

What's next: Easier access to HPC

Some Observations
-Technical Challenges
-Organizational Challenges
-Cultural Challenges

Discussion of a vision for an "Ideal (Open)NEX Platform"

-----
The Virtual Machine Scaler: Infrastructure Management Support for Scientific Modeling on IaaS Clouds
Wes Lloy, Colorado State University

CSIP: Cloud Services Innovation Platform
-Provide scientific modeling as a service (MaaS)
-Facilitate science research and delivery

Supporting Science Discovery and Delivery
-UDSA-AG Systems Model Research
-USDA-NRCS: AG Systems Production Models

CSIP Model Services Diagram

Scientific Modeling Cloud Challenges
-Model Services Deployment: deploy each component of an application to VMs and run in isolation. Not cost effective. Can take advantage of combinations and consolidate VMs.
-Elasticity: Scale computational resources for each tier of the application
-Green computing can lead to resource contention
-Overprovisioning: too many VMs on same host
-Virtualization Overhead

Discussion of Amazon Spot Instances

The Virtual Machine (VM) - Scaler
-Organization Diagram showed
-Web services application
-Discussion of Supporting features

VM Pools
-Supports work with many same-type VMs
-Addresses Launch Latency

Resource Utilization Data Collection
-Resource utilization sensors
--Sensor on each VM/PM
--Transmits data to VM-Scaler at configurable intervals

Resource Utilization Checkpointing
-Captures resource utilization at a time

Scaling Tasks
-Scaling service request
-Prelaunch VMs
-etc

Hot Spot Detection
-Resource utilization thresholds
-Performance model approach

Least-Busy VM Placement

Least-Busy Job Placement

-----
Eucalyptus 4.0 Upgrade
Brian Thomas

Gives IT and DevOps teams the power they need to easily deploy and manage large-scale clouds.

Discussion of Key Advancements in 4.0

Edge Networking

Discussion of AWS Compatibility

-----
Cloud Computing Cluster onward
Phil Yang

Cloud computing cluster was formed 3 years ago

Discussion of cloud computing experiences among members in attendance

-----
Other Notes:
2014 AGU Meeting
-Session 3041 and Session 1832

NASA ESDWG (Earth Science Data Working Group)

Citation:
Yang, P.; Huang, T.; Scientific Data Analysis on the Cloud ; Summer Meeting 2014. ESIP Commons , February 2014

Comments

thuang's picture

NASA Earth Exchange (NEX): Early Observations on Community Engagement in the Cloud Abstract: NASA Earth Exchange (NEX) is a collaborative platform that combines state-of-the-art supercomputing, Earth system modeling, remote sensing data from NASA and other agencies, and a scientific social network to provide an environment in which users can explore and analyze large Earth science data sets, run modeling and analysis codes, collaborate on new or existing projects, and share results within and/or among communities. Through the deployment of virtualization technologies, an opportunity exists to create complete modeling and analysis environments that are customizable, “archiveable” and transferable. Allowing users to instantiate such environments on large compute infrastructures that are directly connected to data archives may significantly reduce costs and time associated with scientific efforts by alleviating users from redundantly retrieving and integrating data sets and building codes as well as provides a mechanism for sharing their work with the community. NEX is pursuing this development through OpenNEX partnership with Amazon, Inc. as well as locally through the NEX OpenSandbox, which provides private cloud environment collocated with NASA supercomputing center. This talk will focus on some of the reasons for pursuing the cloud as one of our platforms as well as our first observations on trying to generate interest and participation from a large community of geoscientists, software engineers, students and the general public. Petr Votava is a senior software engineer at NASA Ames Research Center and University Corporation at Monterey Bay. He was member of the NASA MODIS team for Terra and Aqua missions, co-PI of the Terrestrial Observation and Prediction System (TOPS) and currently he is the technical lead for the NASA Earth Exchange (NEX) science platform. His research interests include software architecture, knowledge management, data and text mining, and semantic web.