After Neon, which has a fixed 128-bit vector length for the instruction set, Arm designed a new SIMD instruction set, as an extension to AArch64, to allow for flexible vector length implementations: SVE (Scalable Vector Extension). SVE improves the architectures suitability for HPC (High Performance Computing) applications, which require very large quantities of data processing.
The SVE2 (Scalable Vector Extension version two) is a superset of SVE and Neon, and allows for more function domains in data-level parallelism. SVE2 inherits the concept, vector registers, and operation principles of SVE. SVE and SVE2 define 32 scalable vector registers. Silicon partners can choose a suitable vector length implementation varying between 128 bits and 2048 bits (with 128 bit increments), based on their hardware design. The advantage of SVE and SVE2 is that there is only one vector instruction set utilizing the scalable variables. This design concept of SVE enables developers to write and build software once, then run the same binaries on different AArch64 hardware (with various SVE vector length implementations), without rebuilding the binaries. Removing the requirement to rebuild binaries allows software to be ported more easily. In addition to the scalable vectors, SVE and SVE2 include:
- Per-lane predication
- Gather-load / Scatter-store
- Speculative vectorization
The features listed above help vectorize and optimize loops when you process large datasets.
The main difference between SVE2 and SVE is the functional coverage of the instruction set. SVE was designed for HPC (High Performance Computing) and ML (Machine Learning) applications. SVE2 extends the SVE instruction set to enable more data-processing domains (beyond HPC and ML). The SVE2 instruction set can also accelerate the common algorithms used in following applications:
- Computer vision
- LTE baseband processing
- In-memory database
- Web serving
- General-purpose software.
To help compilers better vectorize for these domains, SVE2 adds a vector-width-agnostic version of the Neon instructions in most of the fixed-point DSP (Digital Signal Processing) and media processing functionality.
What is common to both SVE and SVE2 is that they enable a large amount of data to be collected and processed.
Neither SVE nor SVE2 are an extension of the Neon instruction set. SVE and SVE2 are redesigned for better data parallelism. However, their hardware logic overlays the Neon hardware implementation. When a microarchitecture supports SVE or SVE2, it also supports Neon. To be able to utilize SVE and SVE2, software that runs on that microarchitecture must first utilize Neon.
An SVE2 architecture overview is available to next generation architecture licensees, but is not publicly available yet. For more information about SVE, see Introducing Scalable Vector Extension (SVE). For more information about Neon, see the Neon webpage.