A high performance parallel computing framework to support big climate data analysis


Big earth science data (e.g., over 100 PB climate data) are being created by climate observation and model simulations to support Earth sciences. Ideally, such big data can be provided to scientists with on-demand analytical and simulation capabilities to relieve them from time-consuming computational tasks. However, it is challenging to realize it because processing such big data requires efficient big data management strategies, complex parallel computing algorithms and scalable computing resources. Based on the spatiotemporal index for array-based data, big climate data analytics, Spark, and HDFS, we develop a high performance computing framework to efficiently utilize big climate data to analyze climate models. This paper will be a pilot study on how to support on-demand big Earth data analytics in real time for climatologists.

Collaboration Area: 
Li Z., Hu F., Schnase J., Duffy D., Lee T., Yang C., Bowen M. (2016), A Spatiotemporal Indexing Approach for Efficient Process of Big Array-based Climate Data with MapReduce, International Journal of Geographic Information Science (In press)
Buck, J. B., Watkins, N., LeFevre, J., et al., 2011. SciHadoop: Array-based query processing in hadoop. In Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, ACM, p. 66