Analyzing GPU load

Analyze the overall usage of the GPU by observing the activity on the GPU processing queues, and the workload split between non-fragment and fragment processing. GPU workloads run asynchronously to the CPU, and the fragment and non-fragment queues can run in parallel to each other, provided that sufficient work is available to process. The charts in Streamline can be used to determine if an application is GPU bound, as they show if the GPU is being kept busy, and the workload distribution across the two main processing queues.

Mali GPU usage

It is useful to check what frequency the GPU is running at. Highlight a 1 second region of your capture, and look at the Mali GPU usage chart. In this example, the GPU is active for 946 mega-cycles or 946 million cycles per second (946MHz) which is the maximum frequency for the device we tested here. On a higher-end device this number might be much lower, as the GPU may have more shader cores, each running at a lower frequency to achieve the same output more efficiently.

Streamline GPU usage

Look for areas where the GPU is active at the maximum frequency. This indicates that the application is GPU bound. These will look like flat lines, where there is no idle time. GPU active cycles will be approximately equal to the dominant work queue (non-fragment or fragment). Fragment work includes all fragment shading work, and non-fragment work includes all vertex shading, tessellation shading, geometry shading, fixed function tiling, and compute shading. For most graphics content there are significantly more fragments than vertices, so the fragment queue will normally have the highest processing load.

Check if drops in GPU activity correlate with spikes in CPU load, and whether those spikes are caused by a particular application thread. Use the calipers in Streamline to select the region where CPU spike occurs, then look at the Call Paths and Functions views to see which threads are active during the spike. If you don’t have debug symbols, then you will only see library names in these views, but this can be enough to work out what’s going on, because you can see which libraries are being accessed by each thread. If you see lots of time spent in, this is driver overhead often caused by high draw call counts, bulk data upload, or shader compilation and linking.

Filtering with the calipers

Workload scheduling

Fragment and non-fragment workloads should overlap. If you see areas where one queue goes idle while the other is active, you could have a serialization problem. This can happen where there are data dependencies, and Vulkan applications can suffer from this. Refer to Workload pipelining and Pipeline bottlenecks for more information.

Previous Next