Compiling C/C++ code for Arm SVE architectures

The Arm C/C++ Compiler toolchain for the 64-bit Armv8-A architecture supports the Scalable Vector Extensions (SVE), enabling you to:

  • Assemble source code containing SVE instructions.
  • Disassemble ELF object files containing SVE instructions.
  • Compile C and C++ code for SVE-enabled targets, with an advanced auto-vectorizer capable of taking advantage of SVE features. 

This tutorial shows how to compile code to take advantage of SVE functionality. This generates executables that will only run on SVE architectures, or with the Arm Instruction Emulator.

Installing Arm HPC tools suite

Refer to Installing Arm Compiler for HPC for details on how to perform the installation and configure your environment on Linux.

Generating Arm assembly code from C and C++ code

Arm C/C++ Compiler for HPC can produce annotated assembly, and this is a good first step to see how the compiler vectorizes loops.

The following C program subtracts corresponding elements in two arrays, writing the result to a third array. The three arrays are declared using the restrict keyword, indicating to the compiler that they do not overlap in memory.

// example1.c
#define ARRAYSIZE 1024
int a[ARRAYSIZE];
int b[ARRAYSIZE];
int c[ARRAYSIZE];
void subtract_arrays(int *restrict a, int *restrict b, int *restrict c)
{
    for (int i = 0; i < ARRAYSIZE; i++)
    {
        a[i] = b[i] - c[i];
    }
}

int main()
{
    subtract_arrays(a, b, c);
}

Compile the program as follows:

armclang -O3 -S -march=armv8-a+sve -o example1.s example1.c

The -Olevel option specifies the optimization level. The -O0 option is the lowest optimization level, while -O3 is the highest. Arm C/C++ compiler only performs auto-vectorization at -O2 and higher:

The output assembly code is saved as example1.s. The section of the generated assembly language file containing the compiled subtract_arrays function appears as follows:

subtract_arrays:                        // @subtract_arrays
// BB#0:
        orr     w9, wzr, #0x400
        mov     x8, xzr
        whilelo p0.s, xzr, x9
.LBB0_1:                                // =>This Inner Loop Header: Depth=1
        ld1w    {z0.s}, p0/z, [x1, x8, lsl #2]
        ld1w    {z1.s}, p0/z, [x2, x8, lsl #2]
        sub     z0.s, z0.s, z1.s
        st1w    {z0.s}, p0, [x0, x8, lsl #2]
        incw    x8
        whilelo p0.s, x8, x9
        b.mi    .LBB0_1
// BB#2:
        ret

SVE instructions operate on the z and p register banks. In this example the inner loop is almost entirely composed of SVE instructions. The auto-vectorizer has converted the scalar loop from the original C source code into a vector loop that is independent of the width of SVE vector registers.

Generating an executable binary from C and C++ code

To generate an executable binary, compile your program without the –S option:

armclang -O3 -march=armv8-a+sve -o example1 example1.c

You can specify multiple source files on a single line. Each source file is compiled individually and then linked into a single executable binary:

armclang -O3 -march=armv8-a+sve -o example2 example2a.c example2b.c

Compiling and linking object files as separate steps

To compile each of your source files individually into an object file, specify the -c (compile-only) option, and then pass the resulting object files into another invocation of armclang to link them into an executable binary.

armclang -O3 -march=armv8-a+sve -c -o example2a.o example2a.c
armclang -O3 -march=armv8-a+sve -c -o example2b.o example2b.c
armclang -O3 -march=armv8-a+sve -o example2 example2a.o example2b.o

Common compiler options

See armclang --help and the LLVM documentation for more information about all supported options.

-S
Outputs assembly code, rather than object code. Produces a text .s file containing annotated assembly code.
-c
Performs the compilation step, but does not perform the link step. Produces an ELF object .o file. To later link object files into an executable binary, run armclang again, passing in the object files.
-o file
Specifies the name of the output file.
-march=name[+[no]feature]
Targets an architecture profile, generating generic code that runs on any processor of that architecture. For example -march=armv8-a+sve.
-Olevel
Specifies the level of optimization to use when compiling source files.
--help
Describes the most common options supported by Arm C/C++ Compiler for HPC.
--version
Displays version information and license details.

Resources

For further information on writing SVE code with Arm Compiler for HPC, see the following useful links: