You copied the Doc URL to your clipboard.

7.3. AArch64 NEON instruction format

A number of changes have been made in the syntax of NEON and floating-point instructions to harmonize with the AArch64 core integer and scalar floating-point instruction set syntax. The instruction mnemonics are based closely on ARMv7 NEON.

  • The V prefix of ARMv7 NEON instructions has been removed.

    Some mnemonics have been renamed where the removal of the V prefix caused a clash with the ARM core instruction set mnemonics.

    This means, for example, that there are now instructions with the same name which do the same thing, and can be ARM core instructions, NEON, or floating-point, depending on the syntax of the instruction, for example:

      ADD W0, W1, W2{, shift #amount}}
    

    and

      ADD X0, X1, X2{, shift #amount}}
    

    are A64 base instructions.

     ADD D0, D1, D2
    

    is a scalar floating-point instruction, and

     ADD V0.4H, V1.4H, V2.4H
    

    is a NEON vector instruction.

  • An S, U, F or P prefix has been added to indicate Signed, Unsigned, Floating-point, or Polynomial (only one of these) data types. This mnemonic indicates the data type of the operation. For example:

      PMULL V0.8B, V1.8B, V2.8B
    
  • The vector organization (element size and number of lanes) is described by the register qualifiers. For example:

     ADD Vd.T, Vn.T, Vm.T
    

    where Vd, Vn and Vm are the register names and T is the subdivision of the register to be used. For this example, T is the arrangement specifier and is one of 8B, 16B, 4H, 8H, 2S, 4S or 2D. Any of these can be used, depending on whether 64, 32, 16 or 8-bit data is used, and whether 64 bits or 128 bits of the register are used.

    To add 2 × 64 bit lanes, use

     ADD V0.2D, V1.2D, V2.2D
    
  • As in ARMv7, some NEON data processing instructions are available in Normal, Long, Wide, Narrow and Saturating variants. Long, Wide and Narrow variants are shown by a suffix:

    • Normal instructions can operate on any vector types, and produce result vectors the same size, and usually the same type, as the operand vectors.

    • Long or Lengthening instructions operate on doubleword vector operands and produce a quadword vector result. The result elements are twice the width of the operands. Long instructions are specified using an L appended to the instruction. For example:

        SADDL V0.4S, V1.4H, V2.4H
      

      Figure 7.6 shows this, with input operands being promoted before the operation.

      Figure 7.6. NEON long instructions

      Figure 7.6. NEON long instructions

    • Wide or Widening instructions operate on a doubleword vector operand and a quadword vector operand, producing a quadword vector result. The result elements and the first operand are twice the width of the second operand elements. Wide instructions have a W appended to the instruction. For example:

        SADDW V0.4S, V1.4H, V2.4S
      

      Figure 7.7 shows this, with the input doubleword operands being promoted before the operation.

      Figure 7.7. NEON wide instructions

      Figure 7.7. NEON wide instructions

    • Narrow or Narrowing instructions operate on quadword vector operands, and produce a doubleword vector result. The result elements are usually half the width of the operand elements. Narrow instructions are specified using an N appended to the instruction. For example:

        SUBHN V0.4H, V1.4S, V2.4S
      

      Figure 7.8 shows this, with input operands being demoted before the operation.

      Figure 7.8.  NEON narrow instructions

      Figure 7.8.  NEON narrow instructions

  • Signed and unsigned saturating variants (identified by an SQ or UQ prefix) are available for a number of instructions, as with SQADD and UQADD. If a result would exceed the maximum or minimum values of the datatype, saturating instructions return that maximum or minimum value. The saturation limits depend on the datatype of the instruction.

    Table 7.2. Saturation ranges
    Data typeSaturation range of x
    Signed byte (S8)-27 <= x < 27
    Signed halfword (S16)-215 <= x < 215
    Signed word (S32)-231 <= x < 231
    Signed doubleword (S64)-263 <= x < 263
    Unsigned byte (U8)0 <= x < 28
    Unsigned halfword (U16)0 <= x < 216
    Unsigned word (U32)0 <= x < 232
    Unsigned doubleword (U64)0 <= x < 264

  • The ARMv7 P prefix for pairwise operations is now a suffix in ARMv8, as for example, in ADDP. Pairwise instructions operate on adjacent pairs of doubleword or quadword operands. For example:

      ADDP V0.4S, V1.4S, V2.4S
    

    Figure 7.9. Pairwise operation

    Figure 7.9. Pairwise operation

  • A V suffix has been added for an across-all-lanes (whole register) operation, for example, as in ADDV. For example:

      ADDV S0, V1.4S
    

    Figure 7.10. Across all lanes operation

    Figure 7.10. Across all lanes operation

  • A 2 suffix, known as the second and upper half specifier, has been added for the new widening, narrowing or lengthening second part instructions. If present, it causes the operation to be performed on the upper 64 bits of the registers holding the narrower elements:

    • Widening instructions with a 2 suffix get their input data from the high numbered lanes of the vector that contains the narrower values, and write the expanded results to the 128-bit destination. For example:

        SADDW2 V0.2D, V1.2D, V2.4S
      

      Figure 7.11. SADDW2

      Figure 7.11. SADDW2

    • Narrowing instructions with a 2 suffix get their input data from the 128-bit source operands and insert their narrowed results into the high numbered lanes of the 128-bit destination, leaving the lower lanes unchanged. For example:

        XTN2 V0.4S, V1.2D
      

      Figure 7.12. XTN2

      Figure 7.12. XTN2

    • Lengthening instructions with a 2 suffix get their input data from the high numbered lanes of the 128-bit source vectors and write the lengthened results to the 128-bit destination. For example:

        SADDL2 V0.2D, V1.4S, V2.4S
      

      Figure 7.13. SADDL2

      Figure 7.13. SADDL2

  • Comparison instructions now use the condition code names to indicate what the condition is and whether (if it applies) the condition is signed or unsigned, for example, CMGT and CMHI, CMGE and CMHS.

Was this page helpful? Yes No