## Migrating loss of precision

The basic idea of floating-point numbers is that the location of the fractional bits which are stored changes, or floats, is based on the magnitude of the number you are trying to represent. The level of accuracy that you can store reduces as the magnitude of the stored number increases.

For many types of shader arithmetic, accuracy of small numbers is important. Examples include: unorm color outputs, UV texture coordinates, and components in unit length vectors in which all values are between 0.0 and 1.0.

Preserving accuracy of the numbers in this output range is important. Therefore, we will now discuss how you can use mathematical construction to reduce the errors introduced by precision limitations.

## Avoid large magnitudes

Avoid creating numbers with a large magnitude that will be turned in to a small number in mathematical operations. For example, consider the expression:

glsl float opA = 100.00; float opB = 0.01; float tmp = (a + b) float result = tmp - a;

When executed at FP32 precision, this expression gives the expected answer 100.01. But, when executed at FP16 precision, this expression gives the answer 99.989.

This happens because of the large difference in magnitudes of the original inputs. This means that the intermediate value of
`tmp`

lacks enough accuracy to store the fractional part of 100.01, and so only contains the value 100. However, the smaller value `tmp - a`

can be stored, meaning that the errors do not cancel out.

To avoid losing accuracy, construct equations that preserve intermediate values, so they are as close as possible to the final magnitude.
For example, if passing in a rotation from the application
into `sin()`

or `cos()`

, we know that the useful part of the function can be found
between [0, 2(PI)). Any values that are higher than this are just repeated rotations larger than 360
degrees, and are visually indistinguishable from a smaller rotation.

So rather than passing in an ever-increasing value from the application, wrap the rotation on the CPU to the range [0, 2(PI)), in turn, preserving as much precision as possible in the useful range.

For this example, if the rotation is not wrapped to a small range on the CPU, then the object eventually ceases rotating. The magnitude of the number becomes so large that adding in a small incremental rotation does not do anything. This is because the small increment is below the accuracy threshold of the stored number.

This happens quickly with FP16 numbers, but it also happens eventually with FP32 numbers.

## Exploit symmetrical functions

The sign-bit is always stored in a floating-point number. For many types of periodic mathematical functions, this can be used to improve accuracy because the magnitude of the numbers that need to be stored can be reduced.

For example, a rotation of +270 degrees is the same as a rotation of -90 degrees. So, for inputs into
`sin()`

and `cos()`

, it is preferable to use values in the range [-(PI), +(PI)) instead of
[0, 2(PI)). This is because the -PI to +PI range halves the maximum magnitude, therefore preserving one bit of accuracy
which the latter values would lose.

## Exploit built-in functions

Built-in functions in the shader libraries are often backed by hardware that preserves more precision than the equivalent function that is implemented in shader code arithmetic.

An example of this is the `Fused Multiply Accumulate`

operation. This operation is very common in
compute applications:

glsl float r = (a * b) + c;

If this operation is implemented as separate multiply and add operations, the result of (a * b) is
rounded to fit into a `tmp`

float. The result of `tmp + c`

is rounded again, so that two sets of rounding errors are introduced.

When using a hardware fused multiply accumulate operation, only the final result needs to be rounded to the output precision. This removes the intermediate rounding result, and the error that it introduces.

## Minimize memory size

Double Data Rate (DDR) memory bandwidth requires lots of power, so when reviewing shaders and narrowing precision, remember also to narrow any associated vertex attributes stored in memory.

Support for `GL_HALF_FLOAT`

attributes is a core feature in OpenGL ES 3.0. If you are using OpenGL ES 2.0, remember that all Mali GPUs support the `[OES_vertex_half_float][VHF]`

extension.

OpenGL ES OES Vertex Half-Float Information

A caveat of using lower numerical precision is that, In general, lower precision is better. However, the cost of type conversion may not be free. Therefore, try to minimize the number of casts needed in shader code by loading data at a suitable precision level.