Additional techniques for reducing pipeline cycles
There are a number of additional techniques you can use to reduce the cycles used in each pipeline.
Avoid register spilling
The Mali™ Offline Shader Compiler indicates if your shader spills registers. Register spilling is typically caused in a thread by a high number of variables that cannot fit entirely in the register set.
Register spilling is typically caused in a thread by a high number of:
Register spilling can also occur if variables are high precision.
Register spilling forces the Mali GPU to read some uniforms from memory, this increases the load on the Load/Store unit and reduces performance. To solve this issue, try to reduce the number and the precision of the uniforms you supply to the shader.
In the Ice Cave demo, some of the shaders suffered from register spilling, for example:
Figure 4-24 Shader with register spilling.
Reducing the number of uniforms permitted solves this problem, and the result is an increase in performance, for example:
Figure 4-25 Shader with no register spilling.
Reduce the precision of varyings and uniforms
When you write custom shaders, you can specify the floating point precision of uniforms and varyings using 32-bit floats or 16-bit half-floats. The precision determines the minimum and maximum values and the granularity of values that the variable can represent.
There are several advantages of using half-floats:
Bandwidth usage is reduced.
The cycles used in the arithmetic pipeline are reduced because the shader compiler can optimize your code to use more parallelization.
The number of uniform registers required is reduced and this in turn reduces the risk of register spilling.
The following code provides examples of a simple fragment shader variant from the Ice Cave demo. The shader is compiled with the Mali Offline Shader Compiler twice.
The first code example is compiled with floats:
Figure 4-26 Shader compiled with Floats
The second code example is compiled with half-floats:
Figure 4-27 Shader compiled with Half floats
The number of Load/Store instructions is reduced in the half-float version. The number of work and uniform registers used is reduced and there is no register spilling.
The code generated with half-floats is also smaller than code generated with floats. This improves the cache hit rate on the Mali GPU increasing performance.
Use world space normal maps for static objects
You can use Tangent space normal maps to increase the details of a model without increasing the geometric detail. You can use tangent space normal maps on animated objects without modifying them because of their locality to each triangle of the mesh.
Unfortunately these require more arithmetic operations to be performed in the shaders to achieve the correct result. For static objects, these calculations are typically unnecessary.
You can alternatively use local space normal maps or world space normal maps. Using local space normal maps reduces the number of calculations performed in the shaders but transformations on the model must be applied to the sampled normal. World space normal maps do not require any transformations but these are static and the objects cannot move. In the Ice Cave demo, the cave and other high quality objects are static and using world space normal maps reduces the number of ALU operations required by the shaders considerably. Most common 3D modeling tools can create world space normal maps or you can generate them by code in an offline process.