Fundamentals

This section describes some of the fundamental concepts of both Neon and SVE technology.

Instruction sets

AArch64 is the name that is used to describe the 64-bit Execution state of the Armv8 architecture. In AArch64 state, the processor executes the A64 instruction set, which contains Neon instructions. These instructions are also referred to as Advanced SIMD instructions. The SVE extension is introduced in version Armv8.2-A of the architecture, and adds a new subset of instructions to the existing Armv8-A A64 instruction set.

The following table highlights the key features and instruction categories that are provided by each extension:

Extension Key features Categorization of new instructions
Neon
  • Provides instructions that can perform mathematical operations in parallel on multiple data streams.
  • Supports double-precision floating-point arithmetic, enabling C code using double-precision.
  • Promotion and demotion
  • Pair-wise operations
  • Load and store operations
  • Logical operators
  • Multiplication operation
SVE
  • Supports wide vector and predicate registers.

    The introduction of predication means that instructions can be divided into two main classes: predicated and unpredicated.

  • Provides a set of instructions that operate on wide vectors.
  • Introduces minor additions to the configuration and identification registers.
  • Load, store, and prefetch instructions
  • Integer operations
  • Vector address calculation
  • Bitwise operations
  • Floating-point operations
  • Predicate operations
  • Move operations
  • Reduction operations
  • For descriptions of each instruction, see What is the Scalable Vector Extension?

For more information about the Neon instruction set, see the Arm A64 Instruction Set Architecture for Armv8-A.

For more information about the SVE instruction set extension, see Arm Architecture Reference Manual Supplement - The Scalable Vector Extension (SVE), for Armv8-A.

Registers, vectors, lanes, and elements

Neon units operate on a separate register file of 128-bit registers and are fully integrated into Armv8-A processors. Neon units use a simple programming model. This is because they use the same address space as applications.

The Neon register file is a collection of registers. These registers can be accessed as 8-bit, 16-bit, 32-bit, 64-bit, or 128-bit registers.

The Neon registers contain vectors. A vector is divided into lanes, and each lane contains a data value called an element.

All elements in a vector have the same data type.

The number of lanes in a Neon vector depends on the size of the vector and the data elements in the vector. For example, a 128-bit Neon vector can contain the following element sizes:

  • Sixteen 8-bit elements
  • Eight 16-bit elements
  • Four 32-bit elements
  • Two 64-bit elements

However, Neon instructions always operate on 64-bit or 128-bit vectors.

In SVE, the instruction set operates on a new set of vector and predicate registers: 32 Z registers, 16 P registers, and one First Faulting Register (FFR):

  • The Z registers are data registers. Z register bits are an IMPLEMENTATION DEFINED multiple of 128, up to an architectural maximum of up to 2048 bits. Data in these registers can be interpreted as 8-bit, 16-bit, 32-bit, 64-bit, or 128-bit. The low 128 bits of each Z register overlap with the corresponding Neon registers, and therefore also overlap with the scalar floating-point registers.
  • The P registers hold one bit for each byte that is available in a Z register. In other words, a P register is always 1/8th the size of the Z register width. Predicated instructions use a P register to determine which vector elements to process. Each individual bit in the P register specifies whether the corresponding byte in the Z register is active or inactive.
  • The FFR register is a dedicated predicate register that captures the cumulative fault status of a sequence of SVE vector load instructions. SVE provides a first-fault option for some SVE vector load instructions. This option suppresses memory access faults if they do not occur as a result of the first active element of the vector. Instead, the FFR is updated to indicate which of the active vector elements were not successfully loaded.

Both the P registers and the FFR register are unique to SVE.

Vector Length Agnostic programming

SVE introduces the concept of Vector Length Agnostic (VLA) programming.

Unlike traditional SIMD architectures, which define a fixed size for their vector registers, SVE only specifies a maximum size. This freedom of choice enables different Arm architectural licensees to develop their own implementation, targeting specific workloads and technologies which could benefit from a particular vector length.

A goal of SVE is to allow the same program image to be run on any implementation of the architecture. To allow this, SVE includes instructions that permit vector code to adapt automatically to the current vector length at runtime.

For more information about VLA programming, see SVE Vector Length Agnostic programming.

Previous Next