Software implications for v8-A implementations with no hardware floating point
Article ID: 112086438
Published date: 24 Jul 2017
Last updated: -
Applies to: Cortex-A
The v8-A architecture allows implementations without hardware floating point support. For example: Some ARM v8 Cortex-A processors, like Cortex-A53, can be configured without FPU/NEON hardware.
However, to discourage fragmentation, the ARM software specification followed by Linux Distributions and toolchain libraries mandates hardware floating point support at AARCH64.
Hardware designs without hardware float are normally targeted at specific closed source market applications which use bespoke software components. For general use ARM advise that implementations are configured with hardware float support.
Procedure Call Standard for the ARM 64-bit Architecture (AArch64)
Unlike AAPCS32 for ARM 32bit architecture (see appendix), AAPCS64 only defines hard float ABI and no soft float ABI. It means that floating point registers need to be present for floating point parameter passing.
Impact on software and compiler toolchain
To avoid fragmentation of AArch64 software, the express intention for AArch64 is that "no float" configurations of the hardware is allowed only for use cases where there is absolutely no need of floating-point.
Linux kernel support
The following Linux kernel patch allows to run an legacy AArch32 user space software using soft floating point ABI under an AArch64 kernel:
This patch allows Linux kernel to run without FPU/NEON being present.
However, ARM does not encourage partners to license CPUs without the hardware floating point unit, if the intended application is to run a general purpose OS such as Linux. Although it is technically possible to execute the Linux kernel without FPU/NEON, user space applications and libraries may be built with AAPCS64 ABI, which requires floating point hardware.
Compiler toolchain support
In compliance with AAPCS64, GNU GCC for v8 only provides the hard float AArch64 toolchain. This is unlike GCC for v7-A, which provides arm-linux-gnueabi soft float toolchain and arm-linux-gnueabihf hard float toolchain. Additionally, GCC provides a -mgeneral-regs-only option, which makes generated code use only the general-purpose registers. This will prevent the compiler from using floating-point and Advanced SIMD registers but will not impose any restrictions on the assembler.
ARM Compiler v6.x also provides a -mcpu=name+nofp+nosimd option to prevent the use of both floating-point instructions and floating-point registers, subsequent use of floating-point data types in this mode is unsupported.
Hardware floating point in ARMv8 processors is considered standard and not optional. Hence, no special options are required.
Supported architectures include arm64
*The information is subjected to change by third party Linux distributors.
To run generic Linux distributions, we advice configuring v8-A processors with FPU hardware present. (where it is configurable in RTL).
Some compiler options (-mgeneral-regs-only GCC option or -mcpu=name+nofp+nosimd ARMClang option) can prevent compiler using floating point registers, but the code cannot use floating point data type.
Floating point parameter passing in AAPCS64
The procedure call standard use by the Application Binary Interface (ABI) for the ARM architecture (AAPCS) specification defines how separately compiled and separately assembled routines can work together. There is an externally visible interface between such routines.
The AAPCS64 specification is available at http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055b/IHI0055B_aapcs64.pdf). It defines following rules for floating point parameters passing:
5.4 Parameter Passing
The base standard provides for passing arguments in general-purpose registers (r0-r7), SIMD/floating-point registers (v0-v7) and on the stack. For subroutines that take a small number of small parameters, only registers are used.
Stage C – Assignment of arguments to registers and stack
For each argument in the list the following rules are applied in turn until the argument has been allocated. When an argument is assigned to a register any unused bits in the register have unspecified value. When an argument is assigned to a stack slot any unused padding bytes have unspecified value.
C.1 If the argument is a Half-, Single-, Double- or Quad- precision Floating-point or Short Vector Type and the NSRN is less than 8, then the argument is allocated to the least significant bits of register v[NSRN]. The NSRN is incremented by one. The argument has now been allocated.
C.2 If the argument is an HFA or an HVA and there are sufficient unallocated SIMD and Floating-point registers (NSRN + number of members ≤ 8), then the argument is allocated to SIMD and Floating-point Registers (with one register per member of the HFA or HVA). The NSRN is incremented by the number of registers used. The argument has now been allocated.
Floating point parameter passing in AAPCS32
AAPCS32 allows both soft and hard float ABI by defining ‘The base standard’ and ‘The standard variants’.
5 THE BASE PROCEDURE CALL STANDARD
The base standard defines a machine-level, core-registers-only calling standard common to the ARM and Thumb instruction sets. It should be used for systems where there is no floating-point hardware, or where a high degree of inter-working with Thumb code is required.
6.1 VFP and Advanced SIMD Register Arguments
This variant alters the manner in which floating-point values are passed between a subroutine and its caller and allows significantly better performance when a VFP co-processor or the Advanced SIMD Extension is present.
6.1.1 Mapping between registers and memory format
Values passed across a procedure call interface in VFP registers are laid out as follows:
A half precision floating point type is passed as if it were loaded from its memory format into the least significant 16 bits of a single precision register.
A single precision floating point type is passed as if it were loaded from its memory format into a single precision register with VLDR.
A double precision floating point type is passed as if it were loaded from its memory format into a double precision register with VLDR.
See more information in AAPCS32 spec: http://infocenter.arm.com/help/topic/com.arm.doc.ihi0042f/IHI0042F_aapcs.pdf