Inefficiency in 2D Content
Most of the content displayed on mobile screens today still uses 2D sprite layers, or UI elements to build what is shown on-screen. While most of these applications use OpenGL ES for rendering, only a few applications use the 3D functionality that is found within the GPU.
In 2D games there is not much to optimize. This is because OpenGL ES fragment shader processes are usually trivial, for example interpolate a texture coordinate, load a texture sample, and blend to the framebuffer.
Any performance optimizations for 2D content consist of reducing work redundancy by, for example, removing the need for the shader to run for some of the fragments.
This following example shows a typical blit of a square sprite that is drawn over a background layer.
The outer parts of the shield sprite are transparent. However, the border region is partially transparent, so that it fades cleanly into the background without any aliasing artifacts. Finally, the body of the sprite is opaque.
The following image shows a typical 2D render comprising of a single sprite (with alpha channels) in the foreground, and a 2D texture in the background:
There are two main sources of inefficiency here:
- The outer region around this sprite is totally transparent and does not impact the final render at all. However, it still needs time to process.
- Because the middle part of the sprite is totally opaque, it obscures the background pixels directly beneath it. Because the alpha comes from the texture sample, the graphics driver cannot know ahead of time that the background will be obscured. However, these background fragments are still rendered by the GPU. Therefore, this approach wastes valuable processing cycles rendering for something that is not visible in the final on-screen frame.
This is a relatively basic example with only a single layer of overdraw. However, Arm regularly sees real applications in which over half of all rendered fragments on a 1080p screen are redundant.
If applications use OpenGL ES in a different way to remove this redundancy, then the GPU could render the applications faster, or use the performance headroom created to reduce the clock rate and operating voltage. This saves a substantial amount of energy.
In this guide, we will describe how you can achieve this.