A high performance parallel computing framework to support big climate data analysis

Submitted by Fei Hu on Mon, 2015-12-21 12:51

Event:

Winter Meeting 2016

Abstract:

Big earth science data (e.g., over 100 PB climate data) are being created by climate observation and model simulations to support Earth sciences. Ideally, such big data can be provided to scientists with on-demand analytical and simulation capabilities to relieve them from time-consuming computational tasks. However, it is challenging to realize it because processing such big data requires efficient big data management strategies, complex parallel computing algorithms and scalable computing resources. Based on the spatiotemporal index for array-based data, big climate data analytics, Spark, and HDFS, we develop a high performance computing framework to efficiently utilize big climate data to analyze climate models. This paper will be a pilot study on how to support on-demand big Earth data analytics in real time for climatologists.

Collaboration Area:

Cloud Computing

Reference:

Li Z., Hu F., Schnase J., Duffy D., Lee T., Yang C., Bowen M. (2016), A Spatiotemporal Indexing Approach for Efficient Process of Big Array-based Climate Data with MapReduce, International Journal of Geographic Information Science (In press)

Buck, J. B., Watkins, N., LeFevre, J., et al., 2011. SciHadoop: Array-based query processing in hadoop. In Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, ACM, p. 66

Author(s):

Name: Fei Hu
Organization(s): George Mason University
Email: fhu@gmu.edu

Name: chaowei yang
Organization(s): GMU
Email: cyang3@gmu.edu

Name: John Schnase
Organization(s): Goddard Space Flight Center ,NASA
Email: John.L.Schnase@nasa.gov

Name: Daniel Q. Duffy
Organization(s): Goddard Space Flight Center, NASA
Email: daniel.q.duffy@nasa.gov