High Performance Computing
The scale of climate data (observed and model-simulated) is growing rapidly. For example, we already face scalability challenges with IPCC AR4 datasets (multiple model ensembles, initial condition ensembles, emissions scenarios, etc.) plus sensor-based observations and reanalysis data, and these will increase substantially with the fifth IPCC assessment report due in 2014. Besides scalability there are many additional challenges: data-related challenges including complexity, high dimensionality, and the spatio-temporal nature of the data; and hardware-related challenges arising from increasing complexity and capabilities of new HPC systems. We are developing computational methods and tools that leverage the latest technologies (such as multicore architectures, GPGPU accelerators, etc.) to enable the analysis of large-scale datasets. In particular, we are investigating a “co-design” approach for scalable analytics and data mining algorithms, which takes into account the spatio-temporal and potentially multi-scale nature of the data, I/O requirements and optimizations, as well as the presence of multiple hardware configuration and programming models.