Programming with SVE

This section describes the software tools and libraries that support SVE application development. This section also describes how to develop your application for an SVE-enabled target and run it on SVE-enabled hardware, and describes how to run your application under SVE emulation on any Armv8-A-based hardware.

Software and libraries support

To build an SVE application, you must choose a compiler that supports SVE features, such as:

  • Version 8.0+ of the GNU tools support SVE optimization for C/C++/Fortran.
  • Arm Compiler for Linux, a native compiler for Arm Linux. Arm Compiler for Linux versions 18.0+ supports SVE code generation for C, C++, and Fortran code. Arm Compiler for Linux is part of the Arm Linux user-space tooling solution Arm Allinea Studio.
  • Arm Compiler 6, a cross platform compiler for bare-metal application development, also supports SVE code generation from version 6.12.

    In addition to the compilers, you can also rely on some highly-optimized SVE libraries, such as:

  • Arm Performance Libraries, a set of highly optimized math routines, can be linked to your application. Arm Performance Libraries versions 19.3+ support math libraries for SVE. Arm Performance Libraries is part of Arm Compiler for Linux.
  • Other third-party math libraries.

How to program for SVE

There are a few ways to write or generate SVE code. In this section of the guide, we explore four methods of programming for SVE:

  • Write SVE assembly code
  • Program with SVE intrinsics
  • Auto-vectorization
  • Using SVE optimized libraries

Let us look at these four options in more detail.

Write assembly

You can write SVE instructions as inline assembly in your C/C++ code or as a complete function in assembler source. For example:

        .globl  subtract_arrays         // -- Begin function         .p2align        2         .type   subtract_arrays,@function subtract_arrays:               // @subtract_arrays         .cfi_startproc // %bb.0:         orr     w9, wzr, #0x400         mov     x8, xzr         whilelo p0.s, xzr, x9 .LBB0_1:                       // =>This Inner Loop Header: Depth=1         ld1w    { z0.s }, p0/z, [x1, x8, lsl #2]         ld1w    { z1.s }, p0/z, [x2, x8, lsl #2]         sub     z0.s, z0.s, z1.s         st1w    { z0.s }, p0, [x0, x8, lsl #2]         incw    x8         whilelo p0.s, x8, x9         b.mi    .LBB0_1 // %bb.2:         ret .Lfunc_end0:         .size   subtract_arrays, .Lfunc_end0-subtract_arrays         .cfi_endproc T

If you are mixing functions that are written in a high-level language and in assembly, you must be familiar with Application Binary Interface (ABI) standard, as updated for SVE. The Procedure Call Standard for Arm Architecture (AAPCS) specifies the data types and register allocations and is most relevant to programming in assembly. The AAPCS requires that:

  • Z0-Z7 and P0-P3 are used for passing the scalable vector parameters and results.
  • Z8-Z15 and P4-P15 are callee-saved.
  • All the other vector registers (Z16-Z31) are corruptible by the callee function, where the caller function is responsible for backing up and restoring them, when needed.

Use SVE instruction functions (instrinsics)

SVE intrinsics are functions supported by the compilers that can be replaced with corresponding instructions. Programmers can directly call the instruction functions in high-level languages like C and C++. The ACLE (Arm C Language Extension) for SVE defines which SVE instruction functions are available, their parameters and what they do. Compilers which support the ACLE can replace the intrinsics with mapped SVE instructions during the compilation.  To use the ACLE intrinsics, you must include the header file “arm_sve.h”, which contains a list of vector types and instruction functions (for SVE) that can be used in C/C++. Each data type describes the size and datatype of the elements in the vector:

  • svint8_t svuint8_t
  • svint16_t svuint16_t svfloat16_t
  • svint32_t svuint32_t svfloat32_t
  • svint64_t svuint64_t svfloat64_t

For example, svint64_t represents a vector of 64-bit signed integers, and svfloat16_t represents a vector of half-precision floating-point numbers.

The following example C code has been manually optimized with SVE intrinsics:

//intrinsic_example.c
#include <arm_sve.h>
svuint64_t uaddlb_array(svuint32_t Zs1, svuint32_t Zs2)
{
         // widening add of even elements
    svuint64_t result = svaddlb(Zs1, Zs2);
    return result;
}

Source code, which includes arm_sve.h, can use the SVE vector types in the same way data types can be used for variable declaration and function parameters. To compile the code using Arm C/C++ Compiler, and target the Armv8-A architecture that supports SVE, use:

armclang -O3 -S -march=armv8-a+sve -o intrinsic_example.s intrinsic_example.c

This command generates the following assembly code:

//instrinsic_example.s
uaddlb_array:                           // @uaddlb_array
        .cfi_startproc
// %bb.0:
        uaddlb  z0.d, z0.s, z1.s
        ret

This example uses Arm Compiler for Linux 20.0.

Auto-vectorization

C/C++/Fortran compilers, for example the native Arm Compiler for Linux and GNU compilers for Arm platforms, support vectorizing C, C++, and Fortran loops using SVE instructions. To generate SVE code, select the appropriate compiler options. For example, when armclang uses the -march=armv8-a+sve option, the armclang also uses the default options -fvectorize and -O2. If you want to use the SVE-enabled version of the libraries, combine -march=armv8-a+sve with -armpl=sve. For more information about the compiler optimization options, refer to the compiler developer and reference guides, or the compiler man pages.

Use optimized libraries

Use libraries that are highly-optimized for SVE, for example Arm Performance Libraries and Arm Compute Library. Arm Performance Libraries contain highly-optimized implementations for BLAS, LAPACK, FFT, sparse linear algebra, and libamath-optimized mathematical functions. To be able to link any of the Arm Performance Libraries functions, you must install Arm Allinea Studio and include armpl.h in your code. To build the application with Arm Compiler for Linux and Arm Performance Libraries, you must specify -armpl=<arg> on the command line. If you use the GNU tools, you must include the Arm Performance Libraries installation path in the linker command line with -L<armpl_install_dir>/lib, and specify the GNU-equivalent to the Arm Compiler for Linux armpl=<arg> option, which is -larmpl_lp64. For more information, please reference to the Arm Performance Libraries Get started guide

How to run an SVE application

If you do not have access to SVE hardware, you can use models or emulators to run your code. There are a few models and emulators to choose from:

  • QEMU: Cross and native models, which support modeling on Arm AArch64 platforms with SVE
  • Fast Models: Cross platform models, which support modeling Arm AArch64 platforms with SVE, running on x86-based hosts.
  • Arm Instruction Emulator (ArmIE): Native AArch64 emulator, which supports the emulation of SVE instructions, and other new instructions, for future architectures.
Previous Next