Take GPU processing beyond graphics

Roberto Mijat, Director of Compute, Arm

Modern GPUs are becoming increasingly programmable and can be used for general purpose processing. Frameworks such as OpenCL and Android™ Renderscript enable this. In order to achieve uncompromised feature support and performance you need a processor specifically designed for general purpose computation.  After an introduction to the technology and how it is enabled, this presentation will explore design considerations of the Arm Mali-T6xx series of GPUs that make them the perfect fit for GPU Computing.


Measuring the whole system: holistic profiling of CPU and GPU for optimal vision applications on Arm platforms

Tim Hartley, Senior Product Manager, Arm

Developers of sophisticated vision applications need all the processing power they can lay their hands on – using OpenCL on a GPU can be a vital additional compute resource; but spreading the workload amongst processors and processor types brings its own problems and difficulties. In this brave new heterogeneous world, traditional application optimization techniques are not always effective. The key to achieving performance is twofold: getting access to hardware counters for all the processors in your system, and then understanding what those numbers are telling you.  In this presentation, I will examine the tools and techniques available to profile these sorts of applications and will use real case studies from vision applications. Using tools like DS5 Streamline I will show how to extract meaningful performance numbers and how to interpret them.


Unleashing the benefits of GPU Compute

Roberto Mijat, Director of Compute, Arm

GPU compute on the Arm Mali-T6xx series of GPUs offers a host of benefits: it accelerates data-parallel computation while reducing system work load; reduces platform energy consumption while increasing system throughput; and enhances your system’s value by consolidating functionality while reducing programmer effort. In this presentation, we show how Arm Mali-T6xx processors deliver such benefits on shipping devices. By analyzing ecosystem partners’ use cases, we highlight trends in GPU computing: computational photography, computer vision, and image processing.


GPU Compute for Mobile Devices workshop

Tim Hartley, Senior Product Manager
Johan Gronqvist, Senior Software Engineer, Arm

Desktop and HPC systems have enjoyed the benefits of GPU Compute for several years now. Developers have become accustomed to optimization techniques for GPUs designed for those markets. To fully exploit the compute capabilities of GPUs in mobile and embedded systems, developers need to learn different optimization techniques due to differences in hardware organization.

This workshop describes some optimization techniques for the Arm Mali-T600 GPU series. It starts with a naive implementation of an image processing filter and progressively transforms it to improve hardware utilization on the Arm Mali-T604. It further discusses using Renderscript and OpenCL APIs for enabling GPU Compute.

GPU Compute Optimization

Johan Gronqvist, Senior Software Engineer, Arm

This video presents general guidelines for optimizing compute kernels for Arm Mali. It briefly describes the architecture and some potential bottlenecks, before discussing the hardware counters available in the GPU. It introduces the tools to obtain them and focusing on how to use some important counters in the optimization effort.


GPU Compute example: SGEMM

Johan Gronqvist, Senior Software Engineer, Arm

This video discusses an implementation of matrix multiplication on Mali GPUs. It focuses on understanding the performance characteristics, describing some potential optimizations, and talking about how to interpret the results of those optimizations.