Profile an Open GL ES shader program
Overview Before you begin Compile your shader Optimize your shader Target-aware profiling Limitations Next steps
Compile your shader
The following Open GL ES fragment shader implements the horizontal pass of a 5-tap separable Gaussian blur, with an optional tone mapping stage implemented using a matrix multiply:
#version 310 es #define WINDOW_SIZE 5 precision highp float; precision highp sampler2D; uniform bool toneMap; uniform sampler2D texUnit; uniform mat4 colorModulation; uniform float gaussOffsets[WINDOW_SIZE]; uniform float gaussWeights[WINDOW_SIZE]; in vec2 texCoord; out vec4 fragColor; void main() { fragColor = vec4(0.0); // For each gaussian sample for (int i = 0; i < WINDOW_SIZE; i++) { // Create sample texture coord vec2 offsetTexCoord = texCoord + vec2(gaussOffsets[i], 0.0); // Load data and perform tone mapping vec4 data = texture(texUnit, offsetTexCoord); if (toneMap) { data *= colorModulation; } // Accumulate result fragColor += data * gaussWeights[i]; } }
- In a terminal window, enter the following command to instruct Mali Offline Compiler to compile the shader for a device with a Mali-G76 GPU:
malioc -c Mali-G76 gauss_blur.frag
This returns the following performance report:
Mali Offline Compiler v7.0.0 (Build bc7a3e) Copyright 2007-2019 Arm Limited, all rights reserved Configuration ============= Hardware: Mali-G76 r0p0 Driver: Bifrost r19p0-00rel0 Shader type: OpenGL ES Fragment Main shader =========== Work registers: 32 Uniform registers: 34 Stack spilling: False A LS V T Bound Total Instruction Cycles: 4.5 0.0 0.2 2.5 A Shortest Path Cycles: 1.0 0.0 0.2 2.5 T Longest Path Cycles: 4.5 0.0 0.2 2.5 A A = Arithmetic, LS = Load/Store, V = Varying, T = Texture Shader properties ================= Uniform computation: False
-
Analyze the report. To decide which part of your shader code you need to optimize, identify the critical path units from the hardware units running in parallel. The performance table for the Main shader provides an approximate cycle cost breakdown for the major functional units in the design. For this shader you can see that:
- The shader is texture bound when not using tone mapping. T is the highest value for the shortest path, taking 0.5 cycles a sample for this 5-sample blur. This is as fast as the hardware texture filtering unit in a Mali-G76 can go.
- The shader is arithmetic bound when using matrix-based tone mapping. A is the highest value for the longest path when the conditional tone mapping block is executed.
For full details of all of the reported sections and fields, refer to the Mali Offline Compiler User Guide.