Lab Home | Phone | Search
Center for Nonlinear Studies  Center for Nonlinear Studies
 Home 
 People 
 Current 
 Affiliates 
 Visitors 
 Students 
 Research 
 ICAM-LANL 
 Publications 
 Conferences 
 Workshops 
 Sponsorship 
 Talks 
 Colloquia 
 Colloquia Archive 
 Seminars 
 Postdoc Seminars Archive 
 Quantum Lunch 
 Quantum Lunch Archive 
 CMS Colloquia 
 Q-Mat Seminars 
 Q-Mat Seminars Archive 
 P/T Colloquia 
 Archive 
 Kac Lectures 
 Kac Fellows 
 Dist. Quant. Lecture 
 Ulam Scholar 
 Colloquia 
 
 Jobs 
 Postdocs 
 CNLS Fellowship Application 
 Students 
 Student Program 
 Visitors 
 Description 
 Past Visitors 
 Services 
 General 
 
 History of CNLS 
 
 Maps, Directions 
 CNLS Office 
 T-Division 
 LANL 
 
Wednesday, July 29, 2015
3:00 PM - 4:00 PM
CNLS Conference Room (TA-3, Bldg 1690)

Seminar

​Programmability, Performance, and Dealing with Memory Constraints in In-Situ Analytics

Gagan Agrawal
Ohio State University

Neither the memory capacity, memory access speeds, nor disk bandwidths are increasing at the same rate as the computing power in current and upcoming parallel machines. This has led to considerable recent research on in-situ data analytics. However, many open questions remain on how to perform such analytics, especially in view of memory constraints in systems. This talk will describe two different directions of research in this area. Developing an efficient in-situ implementation involves many challenges, including parallelization, data movement or sharing, and resource allocation. Based on the premise that MapReduce can be an appropriate API for specifying scientific analytics applications, we present a novel MapReduce-like framework that supports efficient in-situ scientific analytics, and address several challenges that arise in applying the MapReduce idea for in-situ processing. Specifically, our implementation can load simulated data directly from distributed memory, and it uses a modified API that helps meet the strict memory constraints of in-situ analytics. The framework is designed so that analytics can be launched from the parallel code region of a simulation program. We have developed both time sharing and space sharing modes for maximizing the performance in different scenarios, with the former even avoiding any copying of data from simulation to the analytics program. Second, we propose an approach that utilizes bitmap index (bitmaps) as the summary structure, performs further data reduction (such as time-steps selection) using just them, and subsequently, stores only the selected bitmaps for post-analysis. We construct compressed bitmaps on the fly, show that many kinds of in-situ analyses can be supported by bitmaps without requiring the original data (and thus reducing memory requirements for in-situ analysis), and instead of writing the original simulation output, we only write the selected bitmaps to the disks (reducing the I/O requirements). We also demonstrate that we are able to use bitmaps for key offline analysis steps.

Host: Curt Canada & James Ahrens