Optimizing C/C++ code with SVE
The Scalable Vector Extension (SVE) to the Armv8-A architecture (AArch64) can be used to significantly accelerate repetitive operations on the large data sets commonly encountered with High Performance Computing (HPC) applications.
SVE instructions pack multiple lanes of data into large registers then perform the same operation across all data lanes, with predication to control which lanes are active. For example, consider the following SVE instruction:
ADD Z0.D, P0/M, Z1.D, Z2.D
This instruction specifies that an addition (
ADD) operation is performed on a SVE vector register, split into 64-bit data lanes.
D specifies the width of the data lane (doubleword, or 64 bits). The width of each vector register is some multiple of 128 bits, between 128 and 2048, but is not specified by the architecture. The predicate register
P0 specifies which lanes should be active. Each active lane in
Z1 is added to the corresponding lane in
Z2 and the result stored in
Z0. Each lane is added separately. There are no carries between the lanes. The merge flag
/M on the predicate specifies that inactive lanes retain their prior value.
Optimize your code for SVE
To optimize your code using SVE, you can either:
- Let the compiler auto-vectorize your code for you.
Arm Compiler for Linux automatically vectorizes your code at optimization levels
-O2and higher. The compiler identifies appropriate vectorization opportunities in your code and uses SVE instructions where appropriate.
At optimization level
-O1you can use the
-fvectorizeoption to enable auto-vectorization.
At the lowest optimization level
-O0auto-vectorization is never performed, even if you specify
-fvectorize. See Compiling code for Arm SVE architectures for more information on setting these options.
- Write SVE assembly code. See Writing inline SVE assembly.
For more information about porting and optimizing existing applications to Arm SVE, see the Porting and Tuning HPC Applications for Arm SVE guide.
Further information about SVE is available as follows:
- Compiling code for Arm SVE architectures
- Porting and Tuning HPC Applications for Arm SVE
- White Paper: A sneak peek into SVE and VLA programming
- Arm C Language Extensions (ACLE) for SVE
- DWARF for the Arm 64-bit Architecture (AArch64) with SVE support
- Procedure Call Standard for the Arm 64-bit Architecture (AArch64) with SVE support