Overview

Arm Performance Libraries provide optimized standard core math libraries for high-performance computing applications on Arm processors. The library routines, which are available through both Fortran and C interfaces, include:

  • BLAS - Basic Linear Algebra Subprograms (including XBLAS, the extended precision BLAS).
  • LAPACK 3.9.0 - a comprehensive package of higher level linear algebra routines.
  • FFT functions - a set of Fast Fourier Transform routines for real and complex data using the FFTW interface.
  • Sparse linear algebra.
  • libamath - a subset of libm, which is a set of optimized mathematical functions.

Arm Performance Libraries are built with OpenMP across many BLAS, LAPACK, FFT, and sparse routines in order to maximize your performance in multi-processor environments.

Installation

Refer to the Arm Performance Libraries (free version) Downloads page for details on how to perform the installation.

Note: To use Arm Performance Libraries functions in your code, you must include the header file <armpl.h>. This header file is located in /opt/arm/<armpl_dir>/include/, or <install_dir>/<armpl_dir>/include/ if you have installed to a different location than the default. If you use FFTs, you will also need to include the fftw3.h header file. If you include other legacy header files such as blas.h or lapack.h, they will also work.


Environment configuration

This section describes how to load the correct environment module for Arm Performance Libraries.

Procedure

Use the following steps to load the Arm Performance Libraries module:

  1. Use this command to see which environment modules are available:

    module avail

    Note: You might need to configure the MODULEPATH environment variable to include the installation directory:

    export MODULEPATH=$MODULEPATH:/opt/arm/modulefiles/
  2. Load the appropriate module for the OS and version of GCC that you are using.

    For example:

    module load armpl/20.3.0_gcc-9.3 

    Tip: Consider adding the module load command to your .profile to run it automatically every time you log in.


Compile and test the examples

Arm Performance Libraries include a number of example programs to compile and run. The examples are located in /opt/arm/<armpl_dir>/examples/, or <install_dir>/<armpl_dir>/examples/, if you have installed to a different location than the default.

The examples directory contains the following:

  • A Makefile to build and execute all of the example programs.
  • A number of different C examples, *.c.
  • A number of different Fortran examples, *.f90.
  • Expected output for each example, *.expected.

The Makefile compiles and runs each example, and compares the generated output to the expected output. Any differences are flagged as errors.

To compile the examples and run the tests, use the following command:

make 

The Makefile  produces output similar to the following sample:

Compiling program armplinfo.f90:
gcc -c armplinfo.f90 -o armplinfo.o
Linking program armplinfo.exe:
gfortran -armplinfo.o -o armplinfo.exe -mcpu=native -lm
Running program armplinfo.exe: 

... 

Testing: no example difference files were generated.
Test passed OK

Example: fftw_dft_r2c_1d_c_example.c

The fftw_dft_r2c_1d_c_example.c example does the following:

  • Creates an FFT plan for a one-dimensional, real-to-Hermitian Fourier transform, and a plan for its inverse, Hermitian-to-real transform.
  • Executes the first plan to output the transformed values in y.
  • Destroys the first plan.
  • Prints the components of the transform.
  • Executes the second plan to get the original data, unscaled.
  • Destroys the second plan.
  • Outputs the original and restored values, scaled (they should be identical).

/*
 * fftw_dft_r2c_1d: FFT of a real sequence
 *
 * ARMPL version 20.3 Copyright Arm 2020
 */

#include <armpl.h>
#include <complex.h>
#include <fftw3.h>
#include <math.h>
#include <stdio.h>

int main(void) {
#define NMAX 20
	double xx[NMAX];
	double x[NMAX];
	// The output vector is of size (n/2)+1 as it is Hermitian
	fftw_complex y[NMAX / 2 + 1];

	printf(
	    "ARMPL example: FFT of a real sequence using fftw_plan_dft_r2c_1d\n");
	printf(
	    "----------------------------------------------------------------\n");
	printf("\n");

	/* The sequence of double data */
	int n = 7;
	x[0] = 0.34907;
	x[1] = 0.54890;
	x[2] = 0.74776;
	x[3] = 0.94459;
	x[4] = 1.13850;
	x[5] = 1.32850;
	x[6] = 1.51370;

	// Use dcopy to copy the values into another array (preserve input)
	cblas_dcopy(n, x, 1, xx, 1);

	// Initialise a plan for a real-to-complex 1d transform from x->y
	fftw_plan forward_plan = fftw_plan_dft_r2c_1d(n, x, y, FFTW_ESTIMATE);
	// Initialise a plan for a complex-to-real 1d transform from y->x (inverse)
	fftw_plan inverse_plan = fftw_plan_dft_c2r_1d(n, y, x, FFTW_ESTIMATE);

	// Execute the forward plan and then deallocate the plan
	/* NOTE: FFTW does NOT compute a normalised transform -
	 * returned array will contain unscaled values */
	fftw_execute(forward_plan);
	fftw_destroy_plan(forward_plan);

	printf("Components of discrete Fourier transform:\n");
	printf("\n");
	int j;
	for (j = 0; j <= n / 2; j++)
		// Scale factor of 1/sqrt(n) to output normalised data
		printf("%4d   (%7.4f%7.4f)\n", j + 1, creal(y[j]) / sqrt(n),
		       cimag(y[j]) / sqrt(n));

	// Execute the reverse plan and then deallocate the plan
	/* NOTE: FFTW does NOT compute a normalised transform -
	 * returned array will contain unscaled values */
	fftw_execute(inverse_plan);
	fftw_destroy_plan(inverse_plan);

	printf("\n");
	printf("Original sequence as restored by inverse transform:\n");
	printf("\n");
	printf("       Original  Restored\n");
	for (j = 0; j < n; j++)
		// Scale factor of 1/n to output normalised data
		printf("%4d   %7.4f   %7.4f\n", j + 1, xx[j], x[j] / n);
	return 0;
} 

To compile and run the example take a copy of the code from `<install-dir>/examples` and follow the steps below:

  1. To generate an object file, compile the source fftw_dft_r2c_1d_c_example.c:

    Compiler Command
    gcc gcc -c -I<install_dir>/include fftw_dft_r2c_1d_c_example.c -o fftw_dft_r2c_1d_c_example.o
  2. Link the object code into an executable:

    Compiler Command
    gcc gcc fftw_dft_r2c_1d_c_example.o -L<install_dir>/lib -o fftw_dft_r2c_1d_c_example.exe -larmpl_lp64 -lgfortran -lm

    The linker and compiler options are:

    • -L<install_dir>/lib adds the Arm Performance Libraries location to the library search path.
    • -larmpl_lp64 links against Arm Performance Libraries.
    • -lgfortran links against the gcc Fortran runtime libraries. This is required because Arm Performance Libraries includes Fortran code.
    • -lm links against the standard math libraries.
  3. Run the executable on your Arm system:

    ./fftw_dft_r2c_1d_c_example.exe

    The executable produces output as follows:

    ARMPL example: FFT of a real sequence using fftw_plan_dft_r2c_1d
    ----------------------------------------------------------------
    
    Components of discrete Fourier transform:
    
       1   ( 2.4836 0.0000)
       2   (-0.2660 0.5309)
       3   (-0.2577 0.2030)
       4   (-0.2564 0.0581)
    
    Original sequence as restored by inverse transform:
    
           Original  Restored
       1    0.3491    0.3491
       2    0.5489    0.5489
       3    0.7478    0.7478
       4    0.9446    0.9446
       5    1.1385    1.1385
       6    1.3285    1.3285
       7    1.5137    1.5137

Library selection

To instruct your compiler to load the optimum version of Arm Performance Libraries for your target architecture and implementation, you can use -larmpl  option.

Supported options and arguments are:

GCC flag
Description
 -DINTEGER32 (Compile)
 -larmpl_lp64 (Link)
Use 32-bit integers.
 -DINTEGER64 (Compile)
 -larmpl_ilp64 (Link)
Use 64-bit integers.
 -larmpl_lp64 Use the single-threaded library.
 -larmpl_lp64_mp Use the OpenMP multi-threaded
library.

Linking against static libraries

The Arm Performance Libraries are supplied in both static and shareable versions, libarmpl_lp64.a and libarmpl_lp64.so. By default, the commands given above link to the shareable version of the library, libarmpl_lp64.so, if that version exists in the specified directory.

To force linking to the static library, add the  -static option.