Compile your shader

The following Open GL ES fragment shader implements the horizontal pass of a 5-tap separable Gaussian blur, with an optional tone mapping stage implemented using a matrix multiply:

#version 310 es  
#define WINDOW_SIZE 5  
precision highp float;  
precision highp sampler2D;  
uniform bool toneMap;  
uniform sampler2D texUnit;  
uniform mat4 colorModulation;  
uniform float gaussOffsets[WINDOW_SIZE];  
uniform float gaussWeights[WINDOW_SIZE];  
in vec2 texCoord;  
out vec4 fragColor;  
void main() {  
   fragColor = vec4(0.0);  
   // For each gaussian sample  
   for (int i = 0; i < WINDOW_SIZE; i++) {  
       // Create sample texture coord  
       vec2 offsetTexCoord = texCoord + vec2(gaussOffsets[i], 0.0);  
       // Load data and perform tone mapping  
       vec4 data = texture(texUnit, offsetTexCoord);  
       if (toneMap) {  
           data *= colorModulation;  
       // Accumulate result  
       fragColor += data * gaussWeights[i];  
  1. In a terminal window, enter the following command to instruct Mali Offline Compiler to compile the shader for a device with a Mali-G76 GPU:
    malioc -c Mali-G76 gauss_blur.frag

    This returns the following performance report:

    Mali Offline Compiler v7.0.0 (Build bc7a3e) 
    Copyright 2007-2019 Arm Limited, all rights reserved 
    Hardware: Mali-G76 r0p0 
    Driver: Bifrost r19p0-00rel0 
    Shader type: OpenGL ES Fragment 
    Main shader 
    Work registers: 32 
    Uniform registers: 34 
    Stack spilling: False 
                                 A   LS    V    T  Bound 
    Total Instruction Cycles:  4.5  0.0  0.2  2.5      A 
    Shortest Path Cycles:      1.0  0.0  0.2  2.5      T 
    Longest Path Cycles:       4.5  0.0  0.2  2.5      A 
    A = Arithmetic, LS = Load/Store, V = Varying, T = Texture 
    Shader properties 
    Uniform computation: False
  2. Analyze the report. To decide which part of your shader code you need to optimize, identify the critical path units from the hardware units running in parallel. The performance table for the Main shader provides an approximate cycle cost breakdown for the major functional units in the design. For this shader you can see that:

    1. The shader is texture bound when not using tone mapping. T is the highest value for the shortest path, taking 0.5 cycles a sample for this 5-sample blur. This is as fast as the hardware texture filtering unit in a Mali-G76 can go.
    2. The shader is arithmetic bound when using matrix-based tone mapping. A is the highest value for the longest path when the conditional tone mapping block is executed.

For full details of all of the reported sections and fields, refer to the Mali Offline Compiler User Guide.

Previous Next