Lab Home | Phone | Search
Center for Nonlinear Studies  Center for Nonlinear Studies
 Home 
 People 
 Current 
 Executive Committee 
 Postdocs 
 Visitors 
 Students 
 Research 
 Publications 
 Conferences 
 Workshops 
 Sponsorship 
 Talks 
 Seminars 
 Postdoc Seminars Archive 
 Quantum Lunch 
 Quantum Lunch Archive 
 P/T Colloquia 
 Archive 
 Ulam Scholar 
 
 Postdoc Nominations 
 Student Requests 
 Student Program 
 Visitor Requests 
 Description 
 Past Visitors 
 Services 
 General 
 
 History of CNLS 
 
 Maps, Directions 
 CNLS Office 
 T-Division 
 LANL 
 
Thursday, August 12, 2010
10:00 AM - 10:30 AM
CNLS Conference Room (TA-3, Bldg 1690)

Student Seminar

Automatic Fusion of OpenCL Kernels

Ian Karlin
CNLS / CCS-2 / University of Colorado at Boulder

Recent trends in computing capabilities have resulted in accelerators (e.g., GPUs, Cells, etc.) having more computational power and memory bandwidth than CPUs. Using accelerators often results in reduced program runtime, but requires architecture-specific code. The OpenCL programming language and associated programming model solves this problem by enabling a single source code to run on both accelerators and CPUs. Computationally intense tasks are written as kernels and run on accelerators, while control logic is handled by a CPU. However, for small kernels the invocation cost is significant, and kernels that use the same data result require repeated data transfer operations. To reduce both invocation costs and the amount of data transferred, kernels can be fused. However, too much fusion can cause capacity misses in local stores and registers. Manually creating efficient fused kernels is time consuming: dependence analysis between kernels is tedious and error prone, and the optimal amount of fusion is machine dependent.

In this talk, we present a tool to automate the fusion of OpenCL kernels. We describe how we eliminate the manual analysis problem by automating the creation of fused OpenCL kernels. We explain how search can be added to our tool to find the amount of fusion that results in the smallest kernel runtimes. Throughout the talk, an elementary multi-physics simulation is used as a motivating example.