This section introduces the Scalable Vector Extension version two (SVE2) of the Arm AArch64 architecture.
Following the development of the Neon architecture extension, which has a fixed 128-bit vector length for the instruction set, Arm designed the Scalable Vector Extension (SVE). SVE is a new Single Instruction Multiple Data (SIMD) instruction set that is used as an extension to AArch64, to allow for flexible vector length implementations. SVE improves the suitability of the architecture for High Performance Computing (HPC) applications, which require very large quantities of data processing.
SVE2 is a superset of SVE and Neon. SVE2 allows for more function domains in data-level parallelism. SVE2 inherits the concept, vector registers, and operation principles of SVE. SVE and SVE2 define 32 scalable vector registers. Silicon partners can choose a suitable vector length design implementation for hardware that varies between 128 bits and 2048 bits, at 128-bit increments. The advantage of SVE and SVE2 is that only one vector instruction set uses the scalable variables.
The SVE design concept enables developers to write and build software once, then run the same binaries on different AArch64 hardware with various SVE vector length implementations. The portability of the binaries means that developers do not have to know the vector length implementation for their system. Removing the requirement to rebuild binaries allows software to be ported more easily. In addition to the scalable vectors, SVE and SVE2 include:
- Per-lane predication
- Gather-load and scatter-store
- Speculative vectorization
These features help vectorize and optimize loops when you process large datasets.
The main difference between SVE2 and SVE is the functional coverage of the instruction set. SVE was designed for HPC and ML applications. SVE2 extends the SVE instruction set to enable data-processing domains beyond HPC and ML. The SVE2 instruction set can also accelerate the common algorithms that are used in the following applications:
- Computer vision
- Long-Term Evolution (LTE) baseband processing
- In-memory database
- Web serving
- General-purpose software
To help compilers vectorize more effectively for these domains, SVE2 adds a vector-width-agnostic version of the Neon instructions in most of the fixed-point Digital Signal Processing (DSP) and media processing functionality.
SVE and SVE2 both enable the collection and processing of a large amount of data.
SVE and SVE2 are not an extension of the Neon instruction set. Instead, SVE and SVE2 are redesigned for better data parallelism than Neon provides. However, the hardware logic of SVE and SVE2 overlays the Neon hardware implementation. When a microarchitecture supports SVE or SVE2, it also supports Neon. To use SVE and SVE2, software that runs on that microarchitecture must first use Neon.
An SVE2 architecture overview is available to next generation architecture licensees, but is not publicly available yet.