Center for Nonlinear Studies

Thursday, September 11, 20252:00 PM - 3:00 PMCNLS Conference Room (TA-3, Bldg 1690)
Colloquium
Adaptive Applications on GPU Supercomputers: Lessons and Solutions on Porting an Astrophysics Application
Gregor DaissUniversity of Stuttgart
Tree-based data structures build the foundation for some of the most efficient algorithms we can use in computational science. For example, many hydrodynamics problems are modeled by partial differential equations that require a discretization of high resolution only in localized regions. A usual way to approach this is using grids with adaptive mesh refinement, which is implemented with such tree-based data structures. For another example, in long-range molecular dynamics, these tree-based structures enable the usage of the algorithms like the Barnes-Hut or the Fast Multipole Method. These significantly reduce the runtime complexity compared with more naive approaches to calculating the particle interactions. The above and similar algorithms can help us to tackle larger and more difficult problems across various domains of scientific computing.Despite the effectiveness of such algorithms, we still frequentlyrequire the computational power offered by GPUs and GPUsupercomputers. Yet, the very tree-based structures that enable these efficient algorithms also pose substantial challenges when it comes to actually implementing scalable simulation codes with efficient GPU compute kernels. This makes it difficult for the developers of these codes to target modern GPU supercomputers such as Perlmutter and Frontier.Our research focuses on how to bridge this gap! In this presentation, we describe these challenges and provide solutions to resolve them. These solutions can help the developers of simulation codes to overcome similar challenges and more easily scale on GPU supercomputers.While our solutions could work for other applications and simulation codes, we will showcase them using a specific one as an example: Octo-Tiger. Octo-Tiger is an adaptive, massively parallel application for the simulation of binary star systems and their outcomes. It is being used to study the contact binary V1309 Sco and the origin of R Coronae Borealis type stars. For this purpose, Octo-Tiger requires adaptive-mesh refinement to achieve the required grid resolution, particularly for convergence studies and to model the flows of mass between the stars accurately. Hence, it is based on a tree data structure. Simulations with Octo-Tiger can easily exceed hundreds of millions of grid cells, necessitating the usage of modern supercomputers.These attributes make Octo-Tiger an ideal candidate for our work.Running Octo-Tiger on GPU supercomputers is desirable given the size of the simulations, but the utilized data structure and interleaved solvers make achieving an efficient and scalable implementation targeting these supercomputers challenging.Here, we show how we resolved these challenges for Octo-Tiger. While we use Octo-Tiger as an example, our solutions are available independently and may be used in other simulation codes as well, paving the road for them to run larger and faster simulations.