You copied the Doc URL to your clipboard.

Half-precision floating-point data types

Use the _Float16 data type for 16-bit floating-point values in your C and C++ source files.

Arm® Compiler 6 supports two half-precision (16-bit) floating-point scalar data types:

  • The IEEE 754-2008 __fp16 data type, defined in the Arm C Language Extensions.
  • The _Float16 data type, defined in the C11 extension ISO/IEC TS 18661-3:2015

The __fp16 data type is not an arithmetic data type. The __fp16 data type is for storage and conversion only. Operations on __fp16 values do not use half-precision arithmetic. The values of __fp16 automatically promote to single-precision float (or double-precision double) floating-point data type when used in arithmetic operations. After the arithmetic operation, these values are automatically converted to the half-precision __fp16 data type for storage. The __fp16 data type is available in both C and C++ source language modes.

The _Float16 data type is an arithmetic data type. Operations on _Float16 values use half-precision arithmetic. The _Float16 data type is available in both C and C++ source language modes.

Arm recommends that for new code, you use the _Float16 data type instead of the __fp16 data type. __fp16 is an Arm C Language Extension and therefore requires compliance with the ACLE. _Float16 is defined by the C standards committee, and therefore using _Float16 does not prevent code from being ported to architectures other than Arm. Also, _Float16 arithmetic operations directly map to Armv8.2-A half-precision floating-point instructions when they are enabled on Armv8.2-A and later architectures. This avoids the need for conversions to and from single-precision floating-point, and therefore results in more performant code. If the Armv8.2-A half-precision floating-point instructions are not available, _Float16 values are automatically promoted to single-precision, similar to the semantics of __fp16 except that the results continue to be stored in single-precision floating-point format instead of being converted back to half-precision floating-point format.

To define a _Float16 literal, append the suffix f16 to the compile-time constant declaration. There is no implicit argument conversion between _Float16 and standard floating-point data types. Therefore, an explicit cast is required for promoting _Float16 to a single-precision floating-point format, for argument passing.

extern void ReadFloatValue(float f);

void ReadValues(void)
{
    // Half-precision floating-point value stored in the _Float16 data type.
    const _Float16 h = 1.0f16; 

 
    // There is no implicit argument conversion between _Float16 and standard floating-point data types.
    // Therefore, this call to the ReadFloatValue() function below is not a call to the declared function extern void ReadFloatValue(float f).
    ReadFloatValue(h);

    // An explicit cast is required for promoting a _Float16 value to a single-precision floating-point value.
    // Therefore, this call to the ReadFloatValue() function below is a call to the declared function extern void ReadFloatValue(float f).
    ReadFloatValue((float)h);

    return;
}

In an arithmetic operation where one operand is of __fp16 data type and the other is of _Float16 data type, the _Float16 value is first converted to __fp16 value and then the operation is completed as if both operands were of __fp16 data type.

void AddValues(_Float16 a, __fp16 b)
{
    _Float16 c;
    __fp16 d;

    // This addition is evaluated in 16-bit half-precision arithmetic. 
    // The result is stored in 16 bits using the _Float16 data type.
    c = a+a;

    // This addition is evaluated in 32-bit single-precision arithmetic. 
    // The result is stored in 16 bits using the __fp16 data type.
    d = b+b; 

    // The value in variable 'a' in this addition is converted to a __fp16 value. 
    // And then the addition is evaluated in 32-bit single-precision arithmetic. 
    // The result is stored in 16 bits using the __fp16 data type.
    d = a+b;

    return;
}

To generate Armv8.2 half-precision floating-point instructions using armclang, you must use the +fp16 architecture extension, for example:

armclang --target=aarch64-arm-none-eabi -march=armv8.2-a+fp16
armclang --target=aarch64-arm-none-eabi -mcpu=cortex-a75+fp16
armclang --target=arm-arm-none-eabi -march=armv8.2-a+fp16
armclang --target=arm-arm-none-eabi -mcpu=cortex-a75+fp16
Was this page helpful? Yes No