Optimization advice
Overview CPU bound CPU to GPU scheduling bound Data resource issues Fragment bound GPU queue scheduling bound High arithmetic or EE High culling percentage High draw calls High load store High overdraw High texture load High varying load Thermally bound Non-fragment bound Related information
High arithmetic load
For applications that are GPU limited, and have high shader loads dominated by arithmetic processing, the solution is to reduce arithmetic complexity in the shaders.
To reduce arithmetic load:
- Reduce precision – mediump computation can be twice as fast as highp computation.
- Avoid branch divergence – divergent branches within the threads of a warp reduce arithmetic efficiency as not all threads are active when executing divergent code paths.
- Vectorize operations – Mali Midgard GPUs use SIMD arithmetic logic, so matrix and vector operations in the source code are more likely to vectorize well into SIMD operations than scalar operations.
- Move processing from per-fragment to per-vertex, to lower evaluation frequency.