GPU Optimization Topics • Threading Details – Wavefronts and warps – Thread scheduling for both AMD and NVIDIA GPUs – Predication • Optimziation – Thread mapping – Device occupancy – Vectorization 2 Work Groups to HW Threads • OpenCL kernels are structured into work groups that map to device compute units • Compute units on GPUs consist of SIMT processing elements • Work groups automatically get b