The A64 64-bit register bank helps reduce register pressure in most applications.
The A64 Procedure Call Standard (PCS) passes up to eight parameters in registers (X0-X7). In contrast, A32 and T32 pass only four arguments in registers, with any excess being passed on the stack.
The PCS also defines a dedicated Frame Pointer (FP), which makes debugging and call-graph profiling easier by making it possible to reliably unwind the stack. Refer to Chapter 9 The ABI for ARM 64-bit Architecture for further information.
A consequence of adopting 64-bit wide integer registers is the varying widths of variables used by programming languages. A number of standard models are currently in use, which differ mainly in the size defined for integers, longs, and pointers:
64-bit Linux implementations use LP64 and this is supported by the A64 Procedure Call Standard. Other PCS variants are defined that can be used by other operating systems.
- Zero register
The zero register (WZR/XZR) is used for a few encoding tricks. For example, there is no plain multiply encoding, just multiply-add. The instruction
W0, W1, W2is identical to
MADDW0, W1, W2, WZR which uses the zero register. Not all instructions can use the XZR/WZR. As we mentioned in Chapter 4, the zero register shares the same encoding as the stack pointer. This means that, for some arguments, for a very limited number of instructions, WZR/XZR is not available, but WSP/SP is used instead.
mov r0, #0
str r0, [...]
In A64 using the zero register:
str wzr, [...]
No need for a spare register. Or write 16 bytes of zeros using:
stp xzr, xzr, [...] etc
A convenient side-effect of the zero register is that there are many
NOPinstructions with large immediate fields. For example,
#<imm>alone gives you 21 bits of data in an instruction with no other side effects. This is very useful for JIT compilers, where code can be patched at runtime.
- Stack pointer
The Stack Pointer (SP) cannot be referenced by most instructions. Some forms of arithmetic instructions can read or write the current stack pointer. This might be done to adjust the stack pointer in a function prologue or epilogue. For example:
ADD SP, SP, #256 // SP = SP + 256
- Program counter
The current Program Counter (PC) cannot be referred to by number as if part of the general register file and therefore cannot be used as the source or destination of arithmetic instructions, or as the base, index or transfer register of load and store instructions.
The only instructions that read the PC are those whose function it is to compute a PC-relative address (
ADRP, literal load, and direct branches), and the branch-and-link instructions that store a return address in the link register (
BLR). The only way to modify the program counter is using branch, exception generation and exception return instructions.
Where the PC is read by an instruction to compute a PC-relative address, then its value is the address of that instruction. Unlike A32 and T32, there is no implied offset of 4 or 8 bytes.
- FP and NEON registers
The most significant update to the NEON registers is that NEON now has 32 16-byte registers, instead of the 16 registers it had before. The simpler mapping scheme between the different register sizes in the floating-point and NEON register bank make these registers much easier to use. The mapping is easier for compilers and optimizers to model and analyze.
- Register indexed addressing
The A64 instruction set provides additional addressing modes with respect to A32, allowing a 64-bit index register to be added to the 64-bit base register, with optional scaling of the index by the access size. Additionally, it provides sign or zero-extension of a 32-bit value within an index register, again with optional scaling.