Lab Home | Phone | Search
Center for Nonlinear Studies  Center for Nonlinear Studies
 Home 
 People 
 Current 
 Executive Committee 
 Postdocs 
 Visitors 
 Students 
 Research 
 Publications 
 Conferences 
 Workshops 
 Sponsorship 
 Talks 
 Seminars 
 Postdoc Seminars Archive 
 Quantum Lunch 
 Quantum Lunch Archive 
 P/T Colloquia 
 Archive 
 Ulam Scholar 
 
 Postdoc Nominations 
 Student Requests 
 Student Program 
 Visitor Requests 
 Description 
 Past Visitors 
 Services 
 General 
 
 History of CNLS 
 
 Maps, Directions 
 CNLS Office 
 T-Division 
 LANL 
 
Tuesday, December 17, 2019
09:30 AM - 10:30 AM
CNLS Conference Room (TA-3, Bldg 1690)

Seminar

Selecting Compression Error Tolerances for Lossy Compressed Checkpoint Restart

Jon Calhoun
Clemson University

Checkpoint restart is a vital component to long running HPC applications. As systems grow in size and complexity, applications compute on larger data sets. Due to data movement bottlenecks, traditional checkpointing approaches that save a subset of the application’s state are becoming prohibitive. Reducing the checkpoint size via lossy compression offers the ability to improve checkpoint restart performance. In this talk, we investigate a methodology for selecting lossy compression error tolerances for checkpointing HPC applications based on numerical properties of partial differential equation (PDE) simulations, such as bounds on the truncation error. We explore the methodology on 1D model problems and two production level applications: PlasComCM and Nek5000. We highlight that this methodology allows error in application variables due to a restart from a lossy compressed checkpoint to be masked by the numerical error in the discretization. This leads to increased efficiency in checkpoint restart without influencing overall accuracy in the simulation. Furthermore, the results show that this methodology is robust to selection of lossy compressor.

Host: Information Science and Technology Institute (ISTI)