Lab Home | Phone | Search | ||||||||
|
||||||||
As Numerical Weather Prediction (NWP) enters the petascale-to-exascale performance era, models need to scale to systems with up to a thousand times more floating-point capability but proportionately less of everything else that matters for performance: memory bandwidth, cache capacity, and I/O bandwidth. Memory latency and bandwidth bound applications will benefit from current and future generation processors only to the extent that large amounts of parallelism can be exposed and efficiently exploited. Efficiently exploiting parallelism, both thread and fine-grained (e.g. vector), depends on keeping data on hand and available for reuse in the processor’s registers and caches. Some aspects of an application’s memory locality and operand reuse are basic properties of the algorithm; however, other aspects may improve with attention to data layouts, loop nesting order, and other restructurings. Understanding the interactions between the program and hardware both informs restructuring and bounds expectations for improvement. In this talk, I will present work that has been done in collaboration with model developers at NCAR and technical staff at Intel Corp. to characterize and improve performance of the Weather Research and Forecast (WRF) model on the Intel Xeon Phi Many Integrated Core (MIC) architecture processor. Host: Hai Ah Nam |