Vector math routines in Arm Compiler for HPC

Arm Compiler for HPC supports the vectorization of loops within C and C++ workloads that invoke the math routines from libm.

Any C loop using functions from <math.h> (or from <cmath> in the case of C++) can be vectorized by invoking the compiler with the option -fsimdmath, together with the usual options that are needed to activate the auto-vectorizer (optimization level -O2 and above).

Examples

The following examples show loops with math function calls that can be vectorized by invoking the compiler with:

armclang -fsimdmath -c -O2 source.c[pp]

C example with loop invoking sin

    /* C code example: source.c */
    #include <math.h>
    void do_something(double * a, double * b, unsigned N) {       for (unsigned i = 0; i < N; ++i) {         /* some computation */         a[i] = sin(b[i]);         /* some computation */       }     }

C++ example with loop invoking std::pow

    // C++ code example: source.cpp
    #include <cmath>
    void do_something(float * a, float * b, unsigned N) {
      for (unsigned i = 0; i < N; ++i) {
        // some computation
        a[i] = std::pow(a[i], b[i]);
        // some computation
      }
    }

How it works

Arm Compiler for HPC contains libsimdmath, a library with SIMD implementations of the routines provided by libm, along with a math.h file that declares the availability of these SIMD functions to the compiler, using the OpenMP #pragma omp declare simd directive.

During loop vectorization, the compiler is aware of these vectorized routines, and can replace a call to a scalar function (for example a double-precision call to sin) with a call to a libsimdmath function that takes a vector of double precision arguments, and returns a result vector of doubles.

The libsimdmath library is built using code based on SLEEF, an open source math library available from the SLEEF website.

A future release of Arm Compiler for HPC will describe a workflow to allow users to declare and link against their own vectorized routines, allowing them to be used in auto-vectorized code.

Limitations

This is an experimental feature which can lead to performance degradations in some cases. We encourage users to test the applicability of this feature on their non-production code, and will address any possible inefficiency in a future release.

-fsimdmath incompatible with 'lazy binding'

-fsimdmath is incompatible with a dynamic linker optimization known as 'lazy binding'. When using -fsimdmath, Arm recommends that you also add -z now to the compile/link flags, in order to disable this optimization during linking.

The Draft Arm Procedure Call Standard for the ARM 64-bit Architecture (AArch64) with SVE support enforces a contract between caller and callee functions. Lazy binding introduces an optimization whereby the true address of functions within a library are not resolved until they are first executed.  This helps reduce unnecessary work when an application is started. The code in the dynamic linker that performs this resolution is not currently aware of the vector procedure call standard. This means it may overwrite registers that were still in use by the caller.

Arm are working on a resolution to this issue, but an effective workaround is to disable lazy binding, as described above.

Get support

Related information