Program with SVE2

Software and libraries support

To build an SVE or SVE2 program, you must choose a compiler that supports SVE and SVE2 features. GNU tools versions 8.0+ support SVE. Arm Compiler for Linux versions 18.0+ support SVE and versions 20.0+ support SVE and SVE2. Both compilers support optimizing C/C++/Fortran code.

Arm Performance Libraries are highly optimized for math routines, and can be linked to your application. Arm Performance Libraries versions 19.3+ support math libraries for SVE.

Arm Compiler for Linux (part of Arm Allinea Studio) consists of the Arm C/C++ Compiler, Arm Fortran Compiler, and Arm Performance Libraries.

How to program for SVE2

There are a few ways to write or generate SVE and SVE2 code: write assembly with SVE and SVE2 instructions, use intrinsics in C/C++/Fortran applications, let compilers auto-vectorize your code, and utilize the SVE-optimized libraries:

  • Write assembly code: you can write assembly files using SVE instructions, or use inline assembly in GNU style. For example:

            .globl  subtract_arrays         // -- Begin function 
            .p2align        2 
            .type   subtract_arrays,@function 
    subtract_arrays:               // @subtract_arrays 
            .cfi_startproc 
    // %bb.0: 
            orr     w9, wzr, #0x400 
            mov     x8, xzr 
            whilelo p0.s, xzr, x9 
    .LBB0_1:                       // =>This Inner Loop Header: Depth=1 
            ld1w    { z0.s }, p0/z, [x1, x8, lsl #2] 
            ld1w    { z1.s }, p0/z, [x2, x8, lsl #2] 
            sub     z0.s, z0.s, z1.s 
            st1w    { z0.s }, p0, [x0, x8, lsl #2] 
            incw    x8 
            whilelo p0.s, x8, x9 
            b.mi    .LBB0_1 
    // %bb.2: 
            ret 
    .Lfunc_end0: 
            .size   subtract_arrays, .Lfunc_end0-subtract_arrays 
            .cfi_endproc T

    To program in assembly, you need to know the ABI (Application Binary Interface) standard updates for SVE (and SVE2). Of all the ABIs, the AAPCS (Procedure Call Standard for Arm Architecture) specifies the data types and register allocations and is most relevant to programming in assembly. The AAPCS requires that:

    • Z0-Z7, P0-P3 are used for parameter and results passing
    • Z8-Z15, P4-P15 are callee-saved registers
    • Z16-Z31 are the corruptible registers

  • Use instruction functions: you can call instruction functions directly in high level languages such as C, C++, or Fortran that match corresponding SVE instructions. These instruction functions, sometimes referred to as intrinsics, are detailed in the ACLE (Arm C Language Extension) for SVE. Intrinsics are functions that match to corresponding instructions, so that programmers can directly call them in high level languages such as C, C++, or Fortran. The instruction functions are inserted with specific instructions after compilation. The ACLE for SVE document also includes the full list of instruction functions for SVE2 that programmers can use.

    For example, take the following code:

    //intrinsic_example.c
    #include <arm_sve.h>
    svuint64_t uaddlb_array(svuint32_t Zs1, svuint32_t Zs2)
    {
             // widening add of even elements
        svuint64_t result = svaddlb(Zs1, Zs2);
        return result;
    }

    Compile it using Arm C/C++ Compiler:

    armclang -O3 -S -march=armv8-a+sve2 -o intrinsic_example.s intrinsic_example.c

    Generates the assembly:

    //instrinsic_example.s
    uaddlb_array: // @uaddlb_array .cfi_startproc
    // %bb.0:
    uaddlb z0.d, z0.s, z1.s ret

    Note: Arm Compiler for Linux 20.0 is used

  • Auto-vectorization: C/C++/Fortran Compilers such as Arm Compiler for Linux and GNU compilers for Arm platforms aim to generate the SVE and SVE2 code from C/C++/Fortran loops. To generate SVE or SVE2 code, you need to select the appropriate compiler options for the SVE or SVE2 features. For example, with armclang, one option that enables SVE2 optimizations is -march=armv8-a+sve2 (coupled with -armpl=sve, if you want to use the SVE version of the libraries). For more information about all the supported options that enable SVE2 features, see the Arm C/C++ Compiler or Arm Fortran Compiler reference guide.

  • Use libraries optimized for SVE and SVE2: there are already highly optimized libraries with SVE available, such as Arm Performance Libraries and Arm Compute Libraries. Arm Performance Libraries contain the highly optimized implementations for BLAS, LAPACK, FFT and math routines. You must install Arm Allinea Studio and include armpl.h in your code to be able to link any of the ArmPL functions. To build the application with ArmPL using Arm Compiler for Linux, you need to specify -armpl=<arg> on the command line. If you use the GNU tools, you need to include the ArmPL installation path on command line. For more information, please refer to Arm Performance Libraries Get Started Guide.

How to run SVE/SVE2 program: Hardware (HW) and Model

Although SVE-enabled hardware is unavailable now, you can use models and emulators for the development of your code ahead of SVE-enabled hardware becoming available. There are a few models and emulators to choose from:

  • QEMU: Cross and native models, supporting modelling Arm AArch64 platforms with SVE.
  • Fast Models: Cross platform models, supporting modelling Arm AArch64 platforms with SVE (AEM with SVE2 support is available for lead partners).
  • ArmIE (Arm Instruction Emulator): Directly running on Arm platforms. Supports SVE, and from version 19.2+ supports SVE2.

How to port applications to SVE or SVE2

For more information about porting your code to Arm or Arm SVE-enabled hardware, see the HPC application porting guides:

Previous Next