Overview
Arm Performance Libraries provide optimized standard core math libraries for highperformance computing applications on Arm processors. The library routines, which are available through both Fortran and C interfaces, include:
 BLAS  Basic Linear Algebra Subprograms (including XBLAS, the extended precision BLAS).
 LAPACK 3.9.0  a comprehensive package of higher level linear algebra routines.
 FFT functions  a set of Fast Fourier Transform routines for real and complex data using the FFTW interface.
 Sparse linear algebra.
 libamath  a subset of libm, which is a set of optimized mathematical functions.
 libastring  a subset of libc, which is a set of optimized string functions.
Arm Performance Libraries are built with OpenMP across many BLAS, LAPACK, FFT, and sparse routines in order to maximize your performance in multiprocessor environments.
Installation
Refer to the Arm Performance Libraries (free version) Downloads page for details on how to perform the installation.
Note: To use Arm Performance Libraries functions in your code, you must include the header file <armpl.h>
. This header file is located in /opt/arm/<armpl_dir>/include/
, or <install_dir>/<armpl_dir>/include/
if you have installed to a different location than the default. If you use FFTs, you will also need to include the fftw3.h
header file. If you include other legacy header files such as blas.h
or lapack.h
, they will also work.
Environment configuration
This section describes how to load the correct environment module for Arm Performance Libraries.
Procedure
Use the following steps to load the Arm Performance Libraries module:

Use this command to see which environment modules are available:
module avail
Note: You might need to configure the
MODULEPATH
environment variable to include the installation directory:export MODULEPATH=$MODULEPATH:/opt/arm/modulefiles/

Load the appropriate module for the OS and version of GCC that you are using.
For example:
module load armpl/21.0.0_gcc10.2
Tip: Consider adding the
module load
command to your.profile
to run it automatically every time you log in.
Compile and test the examples
Arm Performance Libraries include a number of example programs to compile and run. The examples are located in /opt/arm/<armpl_dir>/examples/
, or <install_dir>/<armpl_dir>/examples/
, if you have installed to a different location than the default.
The examples
directory contains the following:
 A
Makefile
to build and execute all of the example programs.  A number of different C examples,
*.c
.  A number of different Fortran examples,
*.f90
.  Expected output for each example,
*.expected
.
The Makefile compiles and runs each example, and compares the generated output to the expected output. Any differences are flagged as errors.
To compile the examples and run the tests, use the following command:
make
The Makefile produces output similar to the following sample:
Compiling program armplinfo.f90: gcc c armplinfo.f90 o armplinfo.o Linking program armplinfo.exe: gfortran armplinfo.o o armplinfo.exe mcpu=native lm Running program armplinfo.exe: ... Testing: no example difference files were generated. Test passed OK
Example: fftw_dft_r2c_1d_c_example.c
The fftw_dft_r2c_1d_c_example.c
example does the following:
 Creates an FFT plan for a onedimensional, realtoHermitian Fourier transform, and a plan for its inverse, Hermitiantoreal transform.
 Executes the first plan to output the transformed values in
y
.  Destroys the first plan.
 Prints the components of the transform.
 Executes the second plan to get the original data, unscaled.
 Destroys the second plan.
 Outputs the original and restored values, scaled (they should be identical).
/* * fftw_dft_r2c_1d: FFT of a real sequence * * ARMPL version 21.0 Copyright Arm 2021 */ #include <armpl.h> #include <complex.h> #include <fftw3.h> #include <math.h> #include <stdio.h> int main(void) { #define NMAX 20 double xx[NMAX]; double x[NMAX]; // The output vector is of size (n/2)+1 as it is Hermitian fftw_complex y[NMAX / 2 + 1]; printf( "ARMPL example: FFT of a real sequence using fftw_plan_dft_r2c_1d\n"); printf( "\n"); printf("\n"); /* The sequence of double data */ int n = 7; x[0] = 0.34907; x[1] = 0.54890; x[2] = 0.74776; x[3] = 0.94459; x[4] = 1.13850; x[5] = 1.32850; x[6] = 1.51370; // Use dcopy to copy the values into another array (preserve input) cblas_dcopy(n, x, 1, xx, 1); // Initialise a plan for a realtocomplex 1d transform from x>y fftw_plan forward_plan = fftw_plan_dft_r2c_1d(n, x, y, FFTW_ESTIMATE); // Initialise a plan for a complextoreal 1d transform from y>x (inverse) fftw_plan inverse_plan = fftw_plan_dft_c2r_1d(n, y, x, FFTW_ESTIMATE); // Execute the forward plan and then deallocate the plan /* NOTE: FFTW does NOT compute a normalised transform  * returned array will contain unscaled values */ fftw_execute(forward_plan); fftw_destroy_plan(forward_plan); printf("Components of discrete Fourier transform:\n"); printf("\n"); int j; for (j = 0; j <= n / 2; j++) // Scale factor of 1/sqrt(n) to output normalised data printf("%4d (%7.4f%7.4f)\n", j + 1, creal(y[j]) / sqrt(n), cimag(y[j]) / sqrt(n)); // Execute the reverse plan and then deallocate the plan /* NOTE: FFTW does NOT compute a normalised transform  * returned array will contain unscaled values */ fftw_execute(inverse_plan); fftw_destroy_plan(inverse_plan); printf("\n"); printf("Original sequence as restored by inverse transform:\n"); printf("\n"); printf(" Original Restored\n"); for (j = 0; j < n; j++) // Scale factor of 1/n to output normalised data printf("%4d %7.4f %7.4f\n", j + 1, xx[j], x[j] / n); return 0; }
To compile and run the example take a copy of the code from `<installdir>/examples` and follow the steps below:

To generate an object file, compile the source
fftw_dft_r2c_1d_c_example.c
:Compiler Command gcc gcc c I<install_dir>/include fftw_dft_r2c_1d_c_example.c o fftw_dft_r2c_1d_c_example.o

Link the object code into an executable:
Compiler Command gcc gcc fftw_dft_r2c_1d_c_example.o L<install_dir>/lib o fftw_dft_r2c_1d_c_example.exe larmpl_lp64 lgfortran lm
The linker and compiler options are:
L<install_dir>/lib
adds the Arm Performance Libraries location to the library search path.larmpl_lp64
links against Arm Performance Libraries.lgfortran
links against the gcc Fortran runtime libraries. This is required because Arm Performance Libraries includes Fortran code.lm
links against the standard math libraries.

Run the executable on your Arm system:
./fftw_dft_r2c_1d_c_example.exe
The executable produces output as follows:
ARMPL example: FFT of a real sequence using fftw_plan_dft_r2c_1d  Components of discrete Fourier transform: 1 ( 2.4836 0.0000) 2 (0.2660 0.5309) 3 (0.2577 0.2030) 4 (0.2564 0.0581) Original sequence as restored by inverse transform: Original Restored 1 0.3491 0.3491 2 0.5489 0.5489 3 0.7478 0.7478 4 0.9446 0.9446 5 1.1385 1.1385 6 1.3285 1.3285 7 1.5137 1.5137
Optimized math routines  libamath
libamath contains AArch64optimized versions of the following scalar functions, in both single and double precision: exponential (exp
, exp2
), logarithm (log
, log2
, log10
), and error functions (erf
, erfc
). In addition, optimized single precision sine and cosine functions are included (sinf
, cosf
, sincosf
).
libamath also contains vectorized versions (Neon and SVE) of all of the common math.h
functions in libm.
You must explicitly link to the libamath library before linking to libm. For example:
gcc code_with_math_routines.c lamath lm
gfortran code_with_math_routines.f lamath lm
Optimized string routines  libastring
libastring provides a set of replacement string.h
functions which are optimized for AArch64: bcmp
, memchr
, memcpy
, memmove
, memset
, strchr
, strchrnul
, strcmpstrcpy
, strlen
, strncmp
, strnlen
.
You must explicitly link to the libastring library to benefit from the performance increase. For example:
gcc code_with_string_routines.c lastring
gfortran code_with_string_routines.f lastring
Library selection
To instruct your compiler to load the optimum version of Arm Performance Libraries for your target architecture and implementation, you can use larmpl
option.
Supported options and arguments are:
GCC flag

Description 
DINTEGER32 (Compile)larmpl_lp64 (Link) 
Use 32bit integers. 
DINTEGER64 (Compile)larmpl_ilp64 (Link) 
Use 64bit integers. 
larmpl_lp64 
Use the singlethreaded library. 
larmpl_lp64_mp 
Use the OpenMP multithreaded library. 
Linking against static libraries
The Arm Performance Libraries are supplied in both static and shareable versions, libarmpl_lp64.a
and libarmpl_lp64.so
. By default, the commands given above link to the shareable version of the library, libarmpl_lp64.so
, if that version exists in the specified directory.
To force linking to the static library, add the static
option.
Documentation
The Arm Performance Libraries Reference Guide is available on the Arm Developer website.
Related information
 Arm Performance Libraries Reference Guide
 For further information about to the standard BLAS Fortran interfaces, refer to the BLAS FAQ
 For further information about the LAPACK and BLAS routines, refer to the LAPACK documentation