Overview

Arm Performance Libraries is available in two variants: a license-controlled commercial variant that is available to Arm Allinea Studio users, and a free to use variant. This tutorial describes how to get started with the commercial variant. To learn about how to get started with the free to use variant, see the Get started with ArmPL (free variant) tutorial.

Arm Performance Libraries provide optimized standard core math libraries for high-performance computing applications on Arm processors. The library routines, which are available through both Fortran and C interfaces, include:

  • BLAS - Basic Linear Algebra Subprograms (including XBLAS, the extended precision BLAS).
  • LAPACK 3.9.0 - a comprehensive package of higher level linear algebra routines.
  • FFT functions - a set of Fast Fourier Transform routines for real and complex data using the FFTW interface.
  • Sparse linear algebra.
  • libamath - a subset of libm, which is a set of optimized mathematical functions.
  • libastring - a subset of libc, which is a set of optimized string functions.

Arm Performance Libraries are built with OpenMP across many BLAS, LAPACK, FFT, and sparse routines in order to maximize your performance in multi-processor environments.

Installation

Arm Performance Libraries is installed as part of Arm Compiler for Linux and requires a license. Refer to Installing Arm Compiler for Linux for details on how to perform the installation, and Arm Allinea Studio licensing for details on how to obtain and install your license.

Note: To use Arm Performance Libraries functions in your code, you must include the header file <armpl.h>. This header file is located in /opt/arm/<armpl_dir>/include/, or <install_dir>/<armpl_dir>/include/ if you have installed to a different location than the default. If you use FFTs, you will also need to include the fftw3.h header file. If you include other legacy header files such as blas.h or lapack.h, they will also work.


Environment configuration

This section describes how to load the correct environment module for Arm Performance Libraries.

Before you begin

Procedure

Use the following steps to configure your environment for Arm Performance Libraries:

  1. Check which environment modules are available:

    module avail

    Note: If you do not see the Arm Compiler for Linux (arm* and GCC gnu* modulefiles, configure your MODULEPATH environment variable to include the installation directory:

    export MODULEPATH=$MODULEPATH:/opt/arm/modulefiles/
  2. Load the appropriate modules, or modules, for your toolchain:

    For Arm Compiler for Linux, load the compiler modulefile:

    module load arm21/21.0

    For GCC load both the GCC and GCC Arm Performance Libraries modulefiles:

    module load gnu10/10.2.0
    module load armpl-AArch64{-SVE}/21.0.0

    For GCC, you must load the specific Arm Performance Libraries module that you require for your target. For an SVE-enabled target, include '-SVE' in the modulefile.

    Tip: Consider adding the module load command to your .profile to run it automatically every time you log in.

  3. Check your environment using the following commands, according to the compiler you are using.

    Note: Ensure that the command contains the appropriate library directories from /opt/arm, which you installed during the installation procedure:

    Compiler Command
    armclang|armclang++|armflang which {armclang|armclang++|armflang}
    gcc|gfortran which {gcc|gfortran}

Compile and test the examples

Arm Performance Libraries include a number of example programs to compile and run. The examples are located in /opt/arm/<armpl_dir>/examples/, or <install_dir>/<armpl_dir>/examples/, if you have installed to a different location than the default.

The examples directory contains the following:

  • A Makefile to build and execute all of the example programs.
  • A number of different C examples, *.c.
  • A number of different Fortran examples, *.f90.
  • Expected output for each example, *.expected.

The Makefile compiles and runs each example, and compares the generated output to the expected output. Any differences are flagged as errors.

To compile the examples and run the tests, use the following command:

make 

The Makefile that uses Arm Compiler for Linux produces output similar to the following sample:

Compiling program armplinfo.f90:
armflang -c armplinfo.f90 -o armplinfo.o -armpl
Linking program armplinfo.exe:
gfortran -armplinfo.o -o armplinfo.exe -armpl -mcpu=thunderx2t99 -lm
Running program armplinfo.exe: 
(export LD_LIBRARY_PATH='/opt/arm/armpl-21.0.0_Generic-AArch64_RHEL-8_aarch64-linux/lib:'; ./armplinfo.exe > armplinfo.res 2>&1) 
ARMPL (ARM Performance Libraries) 

... 

Testing: no example difference files were generated.
Test passed OK

Example: fftw_dft_r2c_1d_c_example.c

The fftw_dft_r2c_1d_c_example.c example does the following:

  • Creates an FFT plan for a one-dimensional, real-to-Hermitian Fourier transform, and a plan for its inverse, Hermitian-to-real transform.
  • Executes the first plan to output the transformed values in y.
  • Destroys the first plan.
  • Prints the components of the transform.
  • Executes the second plan to get the original data, unscaled.
  • Destroys the second plan.
  • Outputs the original and restored values, scaled (they should be identical).

/*
 * fftw_dft_r2c_1d: FFT of a real sequence
 *
 * ARMPL version 21.0 Copyright Arm 2021
 */

#include <armpl.h>
#include <complex.h>
#include <fftw3.h>
#include <math.h>
#include <stdio.h>

int main(void) {
#define NMAX 20
	double xx[NMAX];
	double x[NMAX];
	// The output vector is of size (n/2)+1 as it is Hermitian
	fftw_complex y[NMAX / 2 + 1];

	printf(
	    "ARMPL example: FFT of a real sequence using fftw_plan_dft_r2c_1d\n");
	printf(
	    "----------------------------------------------------------------\n");
	printf("\n");

	/* The sequence of double data */
	int n = 7;
	x[0] = 0.34907;
	x[1] = 0.54890;
	x[2] = 0.74776;
	x[3] = 0.94459;
	x[4] = 1.13850;
	x[5] = 1.32850;
	x[6] = 1.51370;

	// Use dcopy to copy the values into another array (preserve input)
	cblas_dcopy(n, x, 1, xx, 1);

	// Initialise a plan for a real-to-complex 1d transform from x->y
	fftw_plan forward_plan = fftw_plan_dft_r2c_1d(n, x, y, FFTW_ESTIMATE);
	// Initialise a plan for a complex-to-real 1d transform from y->x (inverse)
	fftw_plan inverse_plan = fftw_plan_dft_c2r_1d(n, y, x, FFTW_ESTIMATE);

	// Execute the forward plan and then deallocate the plan
	/* NOTE: FFTW does NOT compute a normalised transform -
	 * returned array will contain unscaled values */
	fftw_execute(forward_plan);
	fftw_destroy_plan(forward_plan);

	printf("Components of discrete Fourier transform:\n");
	printf("\n");
	int j;
	for (j = 0; j <= n / 2; j++)
		// Scale factor of 1/sqrt(n) to output normalised data
		printf("%4d   (%7.4f%7.4f)\n", j + 1, creal(y[j]) / sqrt(n),
		       cimag(y[j]) / sqrt(n));

	// Execute the reverse plan and then deallocate the plan
	/* NOTE: FFTW does NOT compute a normalised transform -
	 * returned array will contain unscaled values */
	fftw_execute(inverse_plan);
	fftw_destroy_plan(inverse_plan);

	printf("\n");
	printf("Original sequence as restored by inverse transform:\n");
	printf("\n");
	printf("       Original  Restored\n");
	for (j = 0; j < n; j++)
		// Scale factor of 1/n to output normalised data
		printf("%4d   %7.4f   %7.4f\n", j + 1, xx[j], x[j] / n);
	return 0;
} 

To compile and run the example take a copy of the code from `<install-dir>/examples` and follow the steps below:

  1. To generate an object file, compile the source fftw_dft_r2c_1d_c_example.c:

    Compiler Command
    armclang armclang -c -armpl -mcpu=native fftw_dft_r2c_1d_c_example.c -o fftw_dft_r2c_1d_c_example.o
    gcc gcc -c -I<install_dir>/include fftw_dft_r2c_1d_c_example.c -o fftw_dft_r2c_1d_c_example.o
  2. Link the object code into an executable:

    Compiler Command
    armclang armclang fftw_dft_r2c_1d_c_example.o -o fftw_dft_r2c_1d_c_example.exe -armpl -mcpu=native -lm
    gcc gcc fftw_dft_r2c_1d_c_example.o -L<install_dir>/lib -o fftw_dft_r2c_1d_c_example.exe -larmpl_lp64 -lgfortran -lm

    The linker and compiler options are:

    • -armpl provides a shorthand method to specify the required include, library, and link options to the Arm C/C++/Fortran Compiler. The available arguments it accepts are described in the Library selection section.
    • -mcpu=native allows Arm C/C++/Fortran Compiler to infer from the host system which libraries to use. -mcpu=native also enables the compiler to select the appropriate variant of Arm Performance Libraries for your system, at compile and link time.
    • -L<install_dir>/lib adds the Arm Performance Libraries location to the library search path.
    • -larmpl_lp64 links against Arm Performance Libraries.
    • -lgfortran links against the gcc Fortran runtime libraries. This is required because Arm Performance Libraries includes Fortran code.
    • -lm links against the standard math libraries.
  3. Run the executable on your Arm system:

    ./fftw_dft_r2c_1d_c_example.exe

    The executable produces output as follows:

    ARMPL example: FFT of a real sequence using fftw_plan_dft_r2c_1d
    ----------------------------------------------------------------
    
    Components of discrete Fourier transform:
    
       1   ( 2.4836 0.0000)
       2   (-0.2660 0.5309)
       3   (-0.2577 0.2030)
       4   (-0.2564 0.0581)
    
    Original sequence as restored by inverse transform:
    
           Original  Restored
       1    0.3491    0.3491
       2    0.5489    0.5489
       3    0.7478    0.7478
       4    0.9446    0.9446
       5    1.1385    1.1385
       6    1.3285    1.3285
       7    1.5137    1.5137

 

Optimized math routines - libamath

libamath (in the /opt/arm/<armpl_dir>/lib directory) contains AArch64-optimized versions of the following scalar functions, in both single and double precision: exponential (expexp2), logarithm (loglog2log10), and error functions (erferfc). In addition, optimized single precision sine and cosine functions are included (sinfcosfsincosf). Linking to libamath ahead of libm will ensure use of these optimized functions.

libamath also contains vectorized versions (Neon and SVE) of all of the common math.h functions in libm. To provide an enhanced performance using optimized functions, these functions are used by Arm C/C++ Compiler whenever possible.The compiler automatically links to the libamath library. You do not have to supply any specific compiler options to initiate this behavior.

When using libamath with the GCC compiler, you must explicitly link to the libamath library before linking to libm. For example:

gcc code_with_math_routines.c -lamath -lm
gfortran code_with_math_routines.f -lamath -lm

Optimized string routines - libastring

libastring (in the /opt/arm/<armpl_dir>/lib directory) provides a set of replacement string.h functions which are optimized for AArch64: bcmpmemchrmemcpymemmovememsetstrchrstrchrnulstrcmpstrcpystrlenstrncmpstrnlen. Linking to libastring ahead of libc ensures use of these optimized functions.

As with the libamath library, to provide an enhanced performance by default, Arm C/C++ Compiler automatically links to the libastring library before it links to libc. You do not have to supply any specific compiler options to initiate this behavior.

When using libastring with the GCC compiler, you must explicitly link to the libastring library to benefit from the performance increase. For example:

gcc code_with_string_routines.c -lastring
gfortran code_with_string_routines.f -lastring

Library selection

To instruct your compiler to use the optimum version of Arm Performance Libraries for your target architecture and implementation, you can use the -armpl (Arm C/C++/Fortran Compiler) or -larmpl (GCC) option . These options also enable the optimized versions of the C mathematical functions declared in the math.h library, tuned scalar and vector implementations of Fortran math intrinsics, and auto-vectorization of mathematical functions (disable this using -fno-simdmath).

Supported options and arguments are:

Arm Compiler
GCC
Description
 -armpl  -larmpl

Link to Arm Performance Libraries with
the default settings.

For Arm C/C++/Fortran Compiler, when
-armpl is combined with the -mpcu or
-march options, the compiler chooses
the appropriate (optimal) variant of the
library for your target hardware or
architecture.

 -armpl=lp64

(Default)

 -DINTEGER32 (Compile)
 -larmpl_lp64 (Link)
Use 32-bit integers.

 -armpl=ilp64

(Default if using -i8)

 -DINTEGER64 (Compile)
 -larmpl_ilp64 (Link)
Use 64-bit integers.

 -armpl=sequential

(Default)

 -larmpl_lp64 Use the single-threaded library.

 -armpl=parallel

(Default if using -fopenmp)

 -larmpl_lp64_mp Use the OpenMP multi-threaded
library.

Separate multiple arguments using a comma, for example: -armpl=<arg1>,<arg2>.

Default option and argument behavior

For information about the default behavior of the -armpl option and its arguments in Arm Compiler for Linux, see the Arm C/C++ Compiler or Arm Fortran Compiler reference guide.

Selecting target architecture

Arm Performance Libraries provides two libraries, one that is optimized for Armv8-A AArch64 Neon-based implementations, and one that is optimized for Armv8-A AArch64 SVE-based implementations. If you want to use -armpl, you must also specify a target microarchitecture, using the -mcpu option, or architecture, using the -march option.

For -mcpu, by default, the compiler will auto-detect the CPU microarchitecture from the build compiler. To change this behavior and specify a particular target instead, set the -mcpu option to a specific target. For a full list of supported targets, see the -mcpu description in the Arm C/C++ Compiler reference documentation or the Arm Fortran Compiler reference documentation.

For -march, you can find a full list of supported arguments in the Arm C/C++ Compiler reference documentation or the Arm Fortran Compiler reference documentation.

Linking against static libraries

The Arm Performance Libraries are supplied in both static and shareable versions, libarmpl_lp64.a and libarmpl_lp64.so. By default, the commands given above link to the shareable version of the library, libarmpl_lp64.so, if that version exists in the specified directory.

To force linking with the static library, either:

  • Use the compiler flag -static, for example:
    {armclang|armclang++|armflang} driver.{c|cpp|f90} -L${ARMPL_LIBRARIES} -static -larmpl_lp64
  • Insert the name of the static library in the command line, for example:
    {armclang|armclang++|armflang} driver.{c|cpp|f90} ${ARMPL_LIBRARIES}/libarmpl_lp64.a

Get support

If you encounter a problem when using |pl| to develop your application and have an Arm Allinea Studio license you can request technical support.

To help our support team identify the issue that you are experiencing and respond to your question, include information about the version of Arm Performance Libraries that you are using and the system that you are running on. You can obtain the necessary system and library information by running the libarmpl.so file. You can find the libarmpl.so file in the '/lib/' directory of your installation.

You must load the Arm Performance Libraries environment module for your system before running libarmpl.so. For example:

./<install_location>/lib/libarmpl.so