Improving fragment bound content
Content that is fragment bound fails to hit its target performance due to high fragment processing demands. There are three main causes of slow fragment performance.
- Too many fragments to shade
- Too many shader cycles per fragment
- Content that causes poor fragment processing efficiency
One of the most common problems is with content that tries to shade too many fragments, or fragments that are too expensive, given the performance capabilities of the target GPU. This is a particularly common problem in mass-market devices, which have smaller GPU configurations than high-end smartphones, but have a similar screen resolution. For these devices it is a useful exercise to set a per-pixel performance budget to help guide design choices.
Set a performance budget
Consider a device with a Mali-G72 MP2, a two core and two pixel-per-clock GPU, running at 600MHz. The best-case cycle budget for this device when targeting 1080p 60FPS is:
pixelsPerSecond = 1920 * 1080 * 60 = 124,416,000
cyclesPerSecond = 2 * 600,000,000 = 1,200,000,000
cyclesPerPixel = cyclesPerSecond / pixelsPerSecond = 9.6
This budget assumes 100% shader core utilization and must include all frame costs, including vertex shading. This is a usable budget for a 2D game or a simple 3D game, but it’s impossible to run a high-end rendering pipeline inside this budget. The first set of choices that you should review for mass market devices are therefore the target resolution and frame rate, as these are easy to change and have the biggest impact on the overall pipeline cost. Dropping the target configuration to 720p 30FPS frees up a lot of processing capacity, increasing the cycle budget to over 40 cycles per pixel.
Minimize the number of fragments
Once those coarse settings have been decided, it is important to minimize the number of fragments that must be shaded for each frame, as rendering multiple layers of fragment per pixel can rapidly consume valuable cycles.
- Render opaque objects from front-to-back. Objects closest to the camera should be rendered first, with depth testing enabled. This will maximize the number of fragments killed by early depth testing.
- Minimize the number of transparent objects in the scene:
- Disable blending
- Disable alpha-to-coverage
- Reduce the number of shaders that use discard statements.
This will maximize the number of fragments killed by early depth testing and hidden surface removal.
- Review menus and user interfaces for efficient use of transparent layers; layers of 2D interface components can quickly accumulate into a high layer count, which is expensive to process even if the layers themselves are simple.
Minimize processing cost per fragment
For fragments that are required, you should reduce the processing cost per fragment. Exactly what is required depends on the dominant shader pipeline, but here are some best practises:
- Reduce the precision of computation – mediump arithmetic is faster than highp arithmetic.
- Reduce the precision of per-vertex inputs – mediump varying values use less memory and interpolate faster than highp varying values.
- Reduce texture filtering complexity – bilinear (LINEAR_MIP_NEAREST) filtering is faster than trilinear (LINEAR_MIP_LINEAR) filtering, and you should only use anisotropic filtering sparingly.