You copied the Doc URL to your clipboard.

Chapter 8. Porting to A64

This chapter is not intended to act as an exhaustive guide to writing portable code for all systems, however, this should cover the main areas that application engineers should know for code porting on ARM specific machines. There are some significant differences that you should be aware of when moving code to the A64 instruction set in AArch64 from A32 and T32 instruction sets:

  • Most instructions in the A32 instruction set can be executed conditionally. That is, it is possible to append a condition code to the instruction and have the instruction execute (or not) based on the outcome of a previous flag setting instruction. Although this enables programming tricks to reduce code size and cycle count, this significantly complicates the design of high performance processors with out-of-order execution.

    The necessary bits reserved in the opcode field to denote the predication could usefully be put to other purposes (for example, providing the space for selecting from a larger pool of general-purpose registers). In A64 code therefore, only a small set of instructions can be executed conditionally, while some comparison and selection operations depend upon a condition. See Conditional instructions.

  • Many A64 instructions can apply an arbitrary constant shift to the source register or registers limited only by the size of the operand. In addition, A64 provides extended-register forms which can be very useful. Explicit instructions are required to handle more complicated cases such as variable shifts. T32 is also more restrictive than A32, so in some ways A64 is a continuation of the same principles. The flexible Operand2 of A32 does not exist as such in A64, but individual instruction classes have their own options.

  • There are some changes to the available addressing modes for load and store instructions. The offset, pre-index and post-index forms from A32 and T32 are still available in A64. There is a new, PC-relative addressing mode, as the PC cannot be accessed in the same way as a general-purpose register. A64 loads can shift the register inline (though not with as much flexibility as in A32), and they can use some of the extend modes too (so you can have a 32-bit array index, for example).

  • A64 removes all multiple memory access instructions (Load or Store Multiple) from previous ARM architectures, which were able to read or write an arbitrary list of registers from memory. Load Pair (LDP) and Store Pair (STP) instructions, which can operate on any two registers, should be used instead. PUSH and POP have also been removed.

  • ARMv8 adds load and store instructions that include a unidirectional memory barrier: load-acquire and store-release. These are available in ARMv8 A32 and T32 as well as A64. A load-acquire instruction requires that any subsequent memory accesses (in program order) are only visible after the load-acquire. A store-release ensures that all earlier memory accesses are visible before the store-release becomes visible. See Memory barrier and fence instructions.

  • AArch64 does not support the concept of coprocessors, including CP15. New system instructions allow access the registers that are accessed via CP15 coprocessor instructions in AArch32.

  • The CPSR does not exist in AArch64 as a single register. Instead, PSTATE fields (such as NZCV) can be accessed using special-purpose registers.

For many applications, porting code from older versions of the ARM Architecture, or other processor architectures, to A64 means simply recompiling the source code. However, there are a number of areas where C code is not fully portable.

The similarity between A64 and A32/T32 is illustrated in the following example. The three sequences below show a simple C function and the output code in first T32 and then A64. The correspondence between the two is very easy to see.

                             //C code
                             int foo(int val)
                               int newval = bar(val);
                               return val + newval;

           //T32                                     //A64
          foo:                                      foo:
          sub sp, sp, #8                            sub sp, sp #16
          strd r4, r14, [sp]                        stp x19, x30, [sp]
          mov r4, r0                                mov w19, w0
          bl bar                                    bl bar
          add r0, r0, r4                            add w0, w0, w19
          ldrd r4, r14, [sp]                        ldp x19, x30, [sp]
          add sp, sp, #8                            add sp, sp, #16
          bx lr                                     ret

The general-purpose functionality provided by A64 has evolved from that found in A32 and T32, so porting code between the two is fairly straightforward. Translating A32 assembly code to A64 is also generally straightforward. Most instructions map easily between these instruction sets and many sequences become simpler in A64.