Improving CPU to GPU scheduling bound content

Content that is CPU-to-GPU scheduling bound fails to hit its target frame rate, but neither CPU nor GPU is kept busy because of workload serialization across the CPU-GPU interface.

The rendering process is designed to be asynchronously pipelined. The CPU puts new rendering work into a queue, to be processed by the GPU some time later. For content that is not hitting target performance, we want the CPU to keep some work in this queue. If the queue ever empties, then the GPU will go idle and performance is wasted.

The main cause for this queue to empty is where the CPU is blocked and is waiting for some of the queued work to complete. When it is blocked, the CPU stops adding more rendering to the queue, so it is possible for the queue to drain. Here are some recommendations to avoid this issue:

  1. Avoid using API calls that force a pipeline drain, such as glFinish() or a synchronous glReadPixels().
  2. Avoid using glMapBuffer() on a buffer that is still referenced by an in-flight draw call or compute dispatch, unless you are using MAP_UNSYNCHRONIZED.
  3. Use query objects and client-side fences in a pipelined way, waiting for the result at least one, and ideally two, frames after the query or fence was submitted to the command stream.
Previous Next