Optimization advice
Overview CPU bound CPU to GPU scheduling bound Data resource issues Fragment bound GPU queue scheduling bound High arithmetic or EE High culling percentage High draw calls High load store High overdraw High texture load High varying load Thermally bound Non-fragment bound Related information
Improving GPU queue scheduling bound content
Content that is GPU queue scheduling bound fails to hit its target frame rate, and the GPU is busy all of the time, but neither GPU queue is kept busy because of workload serialization across them.
The Mali tile-based rendering process is designed to parallelize across the two GPU queues. One render pass is being vertex shaded in one queue, while an earlier pass is being fragment shaded in the other. This parallel processing ensures the most efficient use of the available processing resources.
OpenGL ES
For OpenGL ES, serialization across queues commonly occurs because of data dependencies between render passes. For example, if a vertex shader in render pass N reads a texture written to by the fragment stage in render pass N-1 then it can not start until render pass N-1 has completed.
Aim to minimize these dependencies by inserting non-dependent work between the two dependent processing stages, allowing other work to fill the bubble.
Vulkan
For Vulkan, workload dependencies are explicitly stated by the application, so the most common cause for queue-to-queue scheduling issues is where the application specifies overly conservative dependencies that force serialization when it is not required. For example, specifying srcStage=BOTTOM_OF_PIPE
and dstStage=TOP_OF_PIPE
makes all render passes run serially with no parallel processing.
Your application should specify the most relaxed dependencies possible while still maintaining correctness. This means that srcStage
should be as early in the pipeline as possible, and dstStage
should be as late in the pipeline as possible. For cases where the minimal valid dependencies still cause slot serialization, follow the advice stated for OpenGL ES above, and insert non-dependent work between the dependent passes to fill the bubble.