The pros and cons of lower precision
Lower precision data types provide a variety of efficiency advantages:
- The hardware needed for narrower arithmetic units is smaller, and fewer transistors need to be toggled. This means that each operation uses less energy.
- Overall performance can be improved by packing vectors of narrower operations together. For example, it is possible to issue a pair of FP16 operations instead of a single FP32 operation.
- Narrow data in memory requires less storage space and reduces the need for expensive external DDR memory, allowing more data to fit into both the data caches and register storage concurrently, improving performance.
However, the trade-off is that narrower data types can only represent a smaller range of numbers. Such as:
- Both floating-point types and integer types suffer from a reduced dynamic range. For example, an FP32 float can store a value with a range of up to 2^62. Compare this to an FP16 float that can only store a value with a range of up to 2^14.
- Floating-point types also suffer from reduced precision inside any given dynamic range. FP32 values provide 24 fractional bits, and an FP16 float provides 11 fractional bits.
Therefore, we recommend that you use narrow types, except when they provide insufficient precision and would result in a rendering error.
In general, any graphics application should be using a mixture of FP types. There are many cases where an FP16 is fine, but there are also cases where this is insufficient, and an FP32 should be used instead.
Because FP16 values offer twice as much energy efficiency and performance as FP32 values, FP16 values should be used in applications when possible. This means that it is easier
to consider the cases in which using
mediump is insufficient.
To ensure output position accuracy and stability of the vertex position in the vertex shader, we recommend that you use
highp. Always use
highp for input positions, transform matrices, and for any distance-based computation for lighting.
Textures are addressed with a UV coordinate between 0 and 1. Using an FP16 coordinate gives 11
fractional bits, with an accuracy of 1 part in 2048. This means it is unable to accurately address common
texture sizes such as 1440p (2560x1440 pixel) renders, even when using
Many games use smaller textures than this, such as 512x512 pixels or 1024x1024 pixels. But, many games also
GL_LINEAR for filtering.
For smooth linear interpolation during filtering and stable addressing at a sub-texel accuracy, we recommend that you use at least 16 sub-texel indices. This means that even for some smaller textures, FP16 is insufficient.
For both reasons, we recommend using
highp varying input variables in the fragment shaders
for texture coordinates, to ensure FP32 interpolation precision. However, if the texture is less
than, or equal to, 2048 texels wide in each dimension, it is not normally required to store texture
coordinate inputs or outputs for the vertex shader at a higher precision.
Storing data in input attribute buffers as
GL_HALF_FLOAT and writing
mediump outputs from vertex shaders minimizes
the memory bandwidth needed to store coordinates in memory, and loading them as
highp inputs in fragment shaders gives the high precision interpolation.
Note: The Mali-400 series of GPUs do not support highp operations in the fragment shader math
units. However, the 400-series GPUs do include a higher precision path between the varying interpolator and the
texture sampling unit. To avoid losing this additional precision for texture coordinates, the interpolated varying value must pass directly into the
Any arithmetic computation on the coordinate before use results in a drop to FP16 precision, and a subsequent drop in sample position accuracy.
Most modern content will use 24-bit unsigned normalized integers or 32-bit float depth buffers. To sample data from these textures without losing data precision, the texture sampler must be a
32-bit per channel texture formats
OpenGL ES 3.0 introduces 32-bit per channel textures, for both floating-point and integer data types. Given their wider data width, using anything other than a
highp sampler would result in data truncation.
While 32-bit texture channels are available, we do not recommend using them due to their high memory bandwidth and energy efficiency costs.
32-bit render targets
OpenGL ES 3.0 and OpenGL ES 3.2 introduces 32-bit per channel output framebuffer attachments, both for integer data, in ES 3.0, and floating-point data, in ES 3.2 data. Given their wider data width, using anything other than a
highp computation and output in the shader program results in data truncation.
While 32-bit channel output framebuffers are available, we do not recommend using them due to their high memory bandwidth and energy efficiency costs.