SciSpark: Applying in-memory distributed computing to weather event detection and tracking
- Resource Type
- Conference
- Authors
- Palamuttam, Rahul; Mogrovejo, Renato Marroquin; Mattmann, Chris; Wilson, Brian; Whitehall, Kim; Verma, Rishi; McGibbney, Lewis; Ramirez, Paul
- Source
- 2015 IEEE International Conference on Big Data (Big Data) Big Data (Big Data), 2015 IEEE International Conference on. :2020-2026 Oct, 2015
- Subject
- Aerospace
Bioengineering
Communication, Networking and Broadcast Technologies
Computing and Processing
Signal Processing and Analysis
Arrays
Sparks
Libraries
Clouds
Meteorology
File systems
Apache Spark
in-memory distributed computing
large scientific datasets
mesoscale convective complexes
- Language
In this paper we present SciSpark, a Big Data framework that extends Apache™ Spark for scaling scientific computations. The paper details the initial architecture and design of SciSpark. We demonstrate how SciSpark achieves parallel ingesting and partitioning of earth science satellite and model datasets. We also illustrate the usability and extensibility of SciSpark by implementing aspects of the Grab 'em Tag 'em Graph 'em (GTG) algorithm using SciSpark and its Map Reduce capabilities. GTG is a topical automated method for identifying and tracking Mesoscale Convective Complexes in satellite infrared datasets.