Improving non-fragment bound content

If you have established that your content that is not fragment bound, it might be failing to hit its target performance due to high vertex processing or compute demands, such as:

  • High vertex count
  • Large vertex data size
  • Vertex shader complexity
  • Expensive compute shaders

Vertices are one of the most expensive inputs into a render, so it is important that they are used efficiently. Typically, each vertex requires between 32 and 64 bytes of input attribute data and high-precision shader processing to accurately compute their position. You should aim to keep triangles as large as possible to amortize the high per-vertex shader processing and memory bandwidth cost.

Here are some recommended best practices for reducing the CPU cost of rendering operations:

  1. Use dynamic mesh level-of-detail to dynamically select a suitable mesh triangle density based on the distance between the object and the camera.
  2. Use pseudo-geometry techniques, such as normal mapping, to replace mesh geometry with textures and shader computation.
  3. Use higher densities of triangles only to enhance areas that normal maps do not, such as silhouette edges of objects.
  4. Use smaller data types such as half-float, and minimize padding, to reduce the number of bytes of data per vertex.
  5. Separate position-related attributes from those related to non-position calculations and store them tightly-packed in separate buffer regions. This maximizes the bandwidth savings from the Mali index-driven vertex shading scheme.


In addition to the direct cost of vertex shading, complex meshes can also reduce fragment shading efficiency. This is because the cost incurred per triangle – such as rasterization and vertex data fetch – is not amortized over very many fragments. In addition, small triangles are more likely to only partially cover each 2x2 pixel quad used by fragment shading, meaning that more quads must be shaded to achieve the same screen coverage.

Previous Next