Profile an Open GL ES shader program
Overview Before you begin Compile your shader Optimize your shader Target-aware profiling Limitations Next steps
Optimize your shader program
Now you have identified the critical path, speed up the tone mapping to improve performance of the shader.
- The first change you can make is to reduce precision. Currently the tone mapping is using a highp (fp32) matrix operation, which has more precision than we need to generate an 8-bit per channel color output. Change the precision to “mediump” (fp16) float and sampler precision by modifying these two lines at the top of the shader:
precision mediump float;
Just these two simple changes significantly reduce the cost of the longest path, as Mali GPUs can process twice as many fp16 operations per clock than fp32 operations.
precision mediump sampler2D;A LS V T Bound
Longest Path Cycles: 2.7 0.0 0.2 2.5 A - After changing the precision, arithmetic is still the longest path. Move the tone mapping out of the accumulation loop, and apply it to the final color instead of the individual samples. This gives the final shader structure:
// For each gaussian sample
for (int i = 0; i < WINDOW_SIZE; i++) {
vec2 offsetTexCoord = texCoord + vec2(gaussOffsets[i], 0.0);
vec4 data = texture(texUnit, offsetTexCoord);
fragColor += data * gaussWeights[i];
}
// Tone map the final color
if (toneMap) {
fragColor *= colorModulation;
}
A LS V T Bound
Total Instruction Cycles: 1.0 0.0 0.2 2.5 T
Shortest Path Cycles: 0.5 0.0 0.2 2.5 T
Longest Path Cycles: 1.0 0.0 0.2 2.5 T
Although the last optimization reduced the arithmetic cost from 2.7 cycles to 1.0 cycles, the shader throughput only improved from 2.7 cycles to 2.5 cycles per fragment because the bottleneck changed from A to T. However, reducing the load on any pipeline will improve energy efficiency and prolong battery life, so these types of optimizations are still worth making, even if they do not improve the headline performance.