Measuring the whole system: holistic profiling of CPU and GPU for optimal vision applications on Arm platforms

Tim Hartley, Senior Product Manager, Arm

Spreading the workload amongst processors and processor types brings its own problems and difficulties. The key to achieving performance is twofold: getting access to hardware counters for all the processors in your system, and then understanding what those numbers are telling you.  Using tools like DS5 Streamline to show how to extract meaningful performance numbers and how to interpret them.

GPU Compute for Mobile Devices workshop

Tim Hartley, Senior Product ManagerJohan Gronqvist, Senior Software Engineer, Arm

This workshop describes some optimization techniques for the Arm Mali-T600 GPU series. It starts with a naive implementation of an image processing filter and progressively transforms it to improve hardware utilization on the Arm Mali-T604. It further discusses using Renderscript and OpenCL APIs for enabling GPU Compute.

GPU Compute Optimization with Hardware Counters

Johan Gronqvist, Senior Software Engineer, Arm

This video presents general guidelines for optimizing compute kernels for Arm Mali. It briefly describes the architecture and some potential bottlenecks, before discussing the hardware counters available in the GPU. It introduces the tools to obtain them and focusing on how to use some important counters in the optimization effort.

GPU Compute example: SGEMM

Johan Gronqvist, Senior Software Engineer, Arm

This video discusses an implementation of matrix multiplication on Mali GPUs. It focuses on understanding the performance characteristics, describing some potential optimizations, and talking about how to interpret the results of those optimizations.