Lab Home | Phone | Search | ||||||||
|
||||||||
Data Science Summer School Lecture The ISTI Data Science Summer School lecture is open to all badge holders. If space becomes limited, seating preference will be given to (i) the students participating in the summer school, (ii) their mentors, and then (iii) other badge holders. The next generation of supercomputers will be exascale high-performance computing (HPC) systems, which are capable of at least 1018 floating-point operations per second, or a factor of 10 times faster than the nation's most powerful supercomputers in use today. The systems will help researchers tackle increasingly complex problems through modeling large-scale systems, such as nuclear reactors or global climate, and simulating complex phenomena. In order to achieve success, these systems must be able to reliably store enormous amounts of high-precision data and perform I/O at an extremely high rate. However, there are serious challenges to build a parallel file system with 10 times performance improvement. In order to overcome the gap between computation speed and the limitations in the file system’s I/O speed and capacity, HPC researchers have to develop smarter and more effective ways to reduce data size without losing important data information. Data compression provides a good solution for reducing the data size. Although lossless compression can retain all the data information, it suffers from very limited compression ratio. To this end, our team and scientists from Argonne National Laboratory have developed a novel lossy scientific data compression framework, as well as a series of optimization techniques for different scientific applications. These techniques can significantly reduce the data size while maintaining the important data information for post-analysis through various accurate error control schemes. One the other hand, lossy compressor developers and users are missing a tool to explore the features of HPC data and understand the data alteration after compression in a systematic and reliable way. To address this gap, we have designed and implemented a generic framework called Z-checker, which can be used to analyze and visualize the characteristics of scientific data offline or online and systematically evaluate the impact of the compressed data on the application and post-analysis. Host: Information, Science, and Technology Institute |