Improving GPU queue scheduling bound content
Content that is GPU queue scheduling bound fails to hit its target frame rate, and the GPU is busy all of the time, but neither GPU queue is kept busy because of workload serialization across them.
The Mali tile-based rendering process is designed to parallelize across the two GPU queues. One render pass is being vertex shaded in one queue, while an earlier pass is being fragment shaded in the other. This parallel processing ensures the most efficient use of the available processing resources.
For OpenGL ES, serialization across queues commonly occurs because of data dependencies between render passes. For example, if a vertex shader in render pass N reads a texture written to by the fragment stage in render pass N-1 then it can not start until render pass N-1 has completed.
Aim to minimize these dependencies by inserting non-dependent work between the two dependent processing stages, allowing other work to fill the bubble.
For Vulkan, workload dependencies are explicitly stated by the application, so the most common cause for queue-to-queue scheduling issues is where the application specifies overly conservative dependencies that force serialization when it is not required. For example, specifying
dstStage=TOP_OF_PIPE makes all render passes run serially with no parallel processing.
Your application should specify the most relaxed dependencies possible while still maintaining correctness. This means that
srcStage should be as early in the pipeline as possible, and
dstStage should be as late in the pipeline as possible. For cases where the minimal valid dependencies still cause slot serialization, follow the advice stated for OpenGL ES above, and insert non-dependent work between the dependent passes to fill the bubble.