SciSpark 201: Searching for MCCs


We introduce a 3 part course module on SciSpark, our AIST14 funded project for Highly Interactive and Scalable Climate Model Metrics and Analytics. The three part course session includes 101, 201, and 301 classes for learning how to use Spark for science.

SciSpark 201 is a 1.5 hour session in which we will use the search for Mesoscale Convective Complexes (MCCs) in Satellite Infrared data to show a real world example of how SciSpark enables real time response to both search queries and modifications to the underlying code. This task is representative of the motivation behind SciSpark -  iterative data-reuse algorithms that share information between multiple stages.


Note for SciSpark 201
A two-pronged approach to Spark
1. The goal of scientific RDD(sRDD)? The scientific Resilient Distributed Dataset (sRDD), exploits Apache Spark's concept of RDDs for multi-dimensional data representing a scientific measurement that can be subset by time, or by space. The sRDD supports multidimensional data and processing of scientific algorithms in the MapReduce paradigm within a distributed environment.

2. sciTensor
The sciTensor datatype is a self-documented array that keeps a list of arrays for a variable arrays and maintains associated metadata in a hashmap. The sciTensor is read into the sRDD and the data within is operated on via arithmetic and relational operations. sciTnesor can load data from: HDFS, OpeNDap, and local FS.

3. Demo
Data visualization:
Scala RDD -> Python RDD -> python visualization
Use case: Mesoscale convective complexes

  • Data: brightness temperature data
  • Nodes: areas with a given brightness temperature value and a given size
  • Edges: determined by area overlaps between nodes within consecutive time periods
  • Identify nodes and edges
  • Find cloud elements and connect the cloud elements between frames.
  • Find the subgraphs of cloudy areas that have evolve in time
PDF icon SciSpark 201.pdf847.87 KB
SciSpark 201: Searching for MCCs; 2016 ESIP Summer Meeting. ESIP Commons , February 2016