Vector math routines in Arm Compiler for HPC

Arm Compiler for HPC supports the vectorization of loops within C and C++ workloads that invoke the math routines from libm.

Any C loop using functions from <math.h> (or from <cmath> in the case of C++) can be vectorized by invoking the compiler with the option -fsimdmath, together with the usual options that are needed to activate the auto-vectorizer (optimization level -O2 and above).

Examples

The following examples show loops with math function calls that can be vectorized by invoking the compiler with:

armclang -fsimdmath -c -O2 source.c[pp]

C example with loop invoking sin

    /* C code example: source.c */
    #include <math.h>
    void do_something(double * a, double * b, unsigned N) {       for (unsigned i = 0; i < N; ++i) {         /* some computation */         a[i] = sin(b[i]);         /* some computation */       }     }

C++ example with loop invoking std::pow

    // C++ code example: source.cpp
    #include <cmath>
    void do_something(float * a, float * b, unsigned N) {
      for (unsigned i = 0; i < N; ++i) {
        // some computation
        a[i] = std::pow(a[i], b[i]);
        // some computation
      }
    }

How it works

Arm Compiler for HPC contains libsimdmath, a library with SIMD implementations of the routines provided by libm, along with a math.h file that declares the availability of these SIMD functions to the compiler, using the OpenMP #pragma omp declare simd directive.

During loop vectorization, the compiler is aware of these vectorized routines, and can replace a call to a scalar function (for example a double-precision call to sin) with a call to a libsimdmath function that takes a vector of double precision arguments, and returns a result vector of doubles.

The libsimdmath library is built using code based on SLEEF, an open source math library available from the SLEEF website.

A future release of Arm Compiler for HPC will describe a workflow to allow users to declare and link against their own vectorized routines, allowing them to be used in auto-vectorized code.

Limitations

This is an experimental feature which can lead to performance degradations in some cases. We encourage users to test the applicability of this feature on their non-production code, and will address any possible inefficiency in a future release.

Get support

Related information