Efficient Render Passes

To get the best performance for each render pass, it is important to follow these basic steps to remove any redundant memory accesses:

First, make sure that each logical render pass in the application only turns into a single physical render pass when submitted to the hardware. Therefore, you should bind each framebuffer object only once and then make all required draw calls before switching to the next framebuffer. This step is important for OpenGL ES, where render passes are inferred.

Secondly, minimize the number of render passes and then merge adjacent render passes where possible. For example, one common inefficiency is a render pass for a 3D render, followed immediately by a render pass that applies a 2D UI overlay over the top of it.

In most cases, the UI drawing can be applied directly on to the 3D render, merging two passes into a single render pass. This trick avoids one round-trip via memory.

Minimizing Start of Tile Loads

Mali GPUs can initialize the tile memory to a clear color value at the start of a render pass without having to read back the old framebuffer content from memory. Before making any draw calls, ensure that you clear or invalidate all attachments at the start of each render pass. Unless you are deliberately drawing on top of what was rendered in a previous frame.

For OpenGL ES, you can use any of these calls to prevent a start of tile read from memory:

  • glClear()
  • glClearBuffer*()
  • glInvalidateFramebuffer()

These must clear the entire framebuffer, not just a sub-region of it.

Caution: Only the start of tile clear is free. Calling glClear() or glClearBuffer*() after the first draw call in a render pass is not free, and this results in a per-fragment clear shader.

For Vulkan, set the loadOp for each attachment to either of:

  • VK_ATTACHMENT_LOAD_OP_CLEAR
  • VK_ATTACHMENT_LOAD_OP_DONT_CARE

Caution: If you call VkCmdClear*() commands to clear an attachment, or manually use a shader to write a constant color, it results in a per-fragment clear shader. To benefit from the fast fixed-function tile initialization, it is much more efficient to use the render pass loadOp operations.

On a Mali GPU, there is no performance difference between a start-of-pass operation and a start-of-pass invalidate. Operations can have hardware performance costs with GPUs from other vendors.

If you plan to completely cover the screen in opaque primitives, and have no dependency on the starting value, we recommend that you use an invalidate operation instead of a clear operation.

Minimizing End of Tile Stores

After a tile has been completed it is written back to main memory. For many applications, some of the framebuffer attachments may be transient. Because they are transient, they do not need to be kept beyond the duration of the render pass. It is important that the driver is notified of what attachments can be safely discarded.

For OpenGL ES, you can notify the driver that an attachment is transient by marking the content as invalid using a call to glInvalidateFramebuffer() as the last draw call in the render pass.

Note: If you write applications using OpenGL ES 2.0, you must use glDiscardFramebufferExt() from the [EXT_discard_framebuffer][EXT_dfb] extension.

For Vulkan, set the storeOp for each transient attachment to VK_ATTACHMENT_STORE_OP_DONT_CARE. For more efficiency, the application can even avoid allocating physical backing memory for transient attachments by allocating the backing memory using VK_MEMORY_PROPERTY_LAZILY_ALLOCATED_BIT and constructing the VkImage with VK_IMAGE_USAGE_TRANSIENT_ATTACHMENT_BIT.

Handling Packed Depth-Stencil

GPUs commonly allocate depth and stencil attachments together in memory using a packed pixel format, such as D24S8. Due to the packed nature of this format, you must read neither attachment during load, and write neither attachment during store to get bandwidth savings.

To reliably get the best performance, we recommend:

  • If you only need a depth buffer, allocate a depth-only format such as D24 or D24X8, and never attach a stencil attachment.
  • If you only need a stencil buffer, allocate a stencil-only format such as S8, and never attach a depth attachment.
  • If you use a packed depth-stencil attachment, always attach both attachments, clear both attachments on load, and invalidate both attachments on store.
  • If you use a packed depth-stencil attachment and need to continue using one of the attachments in a later render pass, invalidate the other at the end of the render pass. This may allow some bandwidth savings when using framebuffer compression.
Previous Next