Getting Started with Arm Performance Libraries

This tutorial guides you through dynamically modifying your user environment on Linux using environment modules and then performing an example program compilation and run.

Overview

Arm Performance Libraries provide optimized standard core math libraries for high-performance computing applications on Arm processors. The library routines, which are available via both Fortran and C interfaces, include:

  • BLAS - Basic Linear Algebra Subprograms (including XBLAS, the extended precision BLAS).

  • LAPACK - a comprehensive package of higher level linear algebra routines.

  • FFT - a set of Fast Fourier Transform routines for real and complex data using the FFTW interface.

  • Sparse matrix-vector multiplication.

  • libamath - a subset of libm, which is a set of optimized mathematical functions.

Arm Performance Libraries are built with OpenMP across many BLAS, LAPACK and FFT routines in order to maximize your performance in multi-processor environments.

Installation

Arm Performance Libraries is installed as part of Arm Compiler for HPC and requires a license. Refer to Installing Arm Compiler for HPC for details on how to perform the installation, and Arm Allinea Studio licensing for details on how to obtain and install your license.

Note: To use Arm Performance Library functions in your code, you must include the header file <armpl.h>. This header file is located in /opt/arm/<armpl_dir>/include/, or <install_dir>/<armpl_dir>/include/ if you have installed to a different location than the default.

Environment configuration

Prerequisites

Procedure

Use the following steps to load the Arm Performance Libraries module:

  1. Use this command to see which environment modules are available:

    module avail

    Note: You might need to configure the MODULEPATH environment variable to include the installation directory:

    export MODULEPATH=$MODULEPATH:/opt/arm/modulefiles/
  2. Load the appropriate module for the compiler you are using.

    Note:

    • If you are using the Arm Compiler for HPC, it is recommended that you load the compiler module only.

    • If you are using gcc, you must load the specific Arm Performance Libraries module that you require.

    For example:

    module load Generic-AArch64/RHEL/7/arm-hpc-compiler/19.0

    If you prefer to use the gcc compiler, ensure that you load the correct module:

    module load Generic-AArch64/RHEL/7/gcc-8.2.0/armpl/19.0.0

    Tip: Consider adding the module load command to your .profile to run it automatically every time you log in.

  3. Check your environment using the following commands, according to the compiler you are using.

    Note: Ensure that the command contains the appropriate library directories from /opt/arm, which you installed during the installation procedure:

    Compiler Command
    armclang which armclang
    gcc which gcc

Compiling an example using Arm Performance Libraries

Arm Performance Libraries include a number of example programs to compile and run. The examples are located in /opt/arm/<armpl_dir>/examples/, or <install_dir>/<armpl_dir>/examples/, if you have installed to a different location than the default.

Example

The fftw_dft_r2c_1d_c_example.c example does the following:

  • Creates an FFTW plan for a one-dimensional, real-to-Hermitian Fourier transform, and a plan for its inverse, Hermitian-to-real transform.

  • Executes the first plan to output the transformed values in y.

  • Destroys the first plan.

  • Prints the components of the transform.

  • Executes the second plan to get the original data, unscaled.

  • Destroys the second plan.

  • Outputs the original and restored values, scaled (they should be identical).

/*
 * fftw_dft_r2c_1d: FFT of a real sequence
 *
 * ARMPL version 19.0 Copyright ARM 2018
 */

#include <armpl.h>
#include <complex.h>
#include <fftw3.h>
#include <math.h>
#include <stdio.h>

int main(void) {
#define NMAX 20
	double xx[NMAX];
	double x[NMAX];
	// The output vector is of size (n/2)+1 as it is Hermitian
	fftw_complex y[NMAX / 2 + 1];

	printf(
	    "ARMPL example: FFT of a real sequence using fftw_plan_dft_r2c_1d\n");
	printf(
	    "----------------------------------------------------------------\n");
	printf("\n");

	/* The sequence of double data */
	int n = 7;
	x[0] = 0.34907;
	x[1] = 0.54890;
	x[2] = 0.74776;
	x[3] = 0.94459;
	x[4] = 1.13850;
	x[5] = 1.32850;
	x[6] = 1.51370;

	// Use dcopy to copy the values into another array (preserve input)
	cblas_dcopy(n, x, 1, xx, 1);

	// Initialise a plan for a real-to-complex 1d transform from x->y
	fftw_plan forward_plan = fftw_plan_dft_r2c_1d(n, x, y, FFTW_ESTIMATE);
	// Initialise a plan for a complex-to-real 1d transform from y->x (inverse)
	fftw_plan inverse_plan = fftw_plan_dft_c2r_1d(n, y, x, FFTW_ESTIMATE);

	// Execute the forward plan and then deallocate the plan
	/* NOTE: FFTW does NOT compute a normalised transform -
	 * returned array will contain unscaled values */
	fftw_execute(forward_plan);
	fftw_destroy_plan(forward_plan);

	printf("Components of discrete Fourier transform:\n");
	printf("\n");
	int j;
	for (j = 0; j <= n / 2; j++)
		// Scale factor of 1/sqrt(n) to output normalised data
		printf("%4d   (%7.4f%7.4f)\n", j + 1, creal(y[j]) / sqrt(n),
		       cimag(y[j]) / sqrt(n));

	// Execute the reverse plan and then deallocate the plan
	/* NOTE: FFTW does NOT compute a normalised transform -
	 * returned array will contain unscaled values */
	fftw_execute(inverse_plan);
	fftw_destroy_plan(inverse_plan);

	printf("\n");
	printf("Original sequence as restored by inverse transform:\n");
	printf("\n");
	printf("       Original  Restored\n");
	for (j = 0; j < n; j++)
		// Scale factor of 1/n to output normalised data
		printf("%4d   %7.4f   %7.4f\n", j + 1, xx[j], x[j] / n);
	return 0;
} 

Compile the example as follows:

  1. Compile the source fftw_dft_r2c_1d_c_example.c to generate an object file:

     Compiler  Command
    armclang armclang -c -armpl -mcpu=native fftw_dft_r2c_1d_c_example.c -o fftw_dft_r2c_1d_c_example.o
    gcc gcc -c -I<install_dir>/include fftw_dft_r2c_1d_c_example.c -o fftw_dft_r2c_1d_c_example.o
  2. Link the object code into an executable:

     Compiler Command
    armclang armclang fftw_dft_r2c_1d_c_example.o -o fftw_dft_r2c_1d_c_example.exe ‑armpl ‑mcpu=native ‑lm
    gcc gcc fftw_dft_r2c_1d_c_example.o -L<install_dir>/lib -o fftw_dft_r2c_1d_c_example.exe ‑larmpl_lp64 ‑lgfortran ‑lm

    The linker and compiler options are:

    • -armpl is recommended when compiling and linking with the arm compiler.

    • -mcpu=native is recommended when using the Arm Compiler, to allow the compiler to infer from the host system which libraries to use.

    • -L<install_dir>/lib adds the Arm Performance Libraries location to the library search path.

    • -larmpl_lp64 links against the Arm Performance Libraries.

    • -lgfortran links against the gcc Fortran runtime libraries. This is required because the Arm Performance Libraries include Fortran code.

    • -lm links against the standard math libraries.

  3. Run the executable on your Arm system:

    ./fftw_dft_r2c_1d_c_example.exe

    The executable produces output as follows:

    ARMPL example: FFT of a real sequence using fftw_plan_dft_r2c_1d
    ----------------------------------------------------------------
    
    Components of discrete Fourier transform:
    
       1   ( 2.4836 0.0000)
       2   (-0.2660 0.5309)
       3   (-0.2577 0.2030)
       4   (-0.2564 0.0581)
    
    Original sequence as restored by inverse transform:
    
           Original  Restored
       1    0.3491    0.3491
       2    0.5489    0.5489
       3    0.7478    0.7478
       4    0.9446    0.9446
       5    1.1385    1.1385
       6    1.3285    1.3285
       7    1.5137    1.5137

Optimized math routines - libamath

The libamath library in the /opt/arm/<armpl_dir>/lib directory contains optimized versions of some libm functions. This is currently provided for the Arm Compiler only (and not gcc). This library is included with the -armpl flag.

Including calls in your code to exp()pow()log()expf()powf()logf()sinf()cosf()tanf(), or  sincosf(), and linking in libamath, uses the optimized Arm implementation:

armclang code_with_math_routines.c -armpl -mcpu=native
armflang code_with_math_routines.f -armpl -mcpu=native

Advanced options

If you are using advanced features of the Arm Performance Libraries, you can use the following additional options:

OpenMP

To use OpenMP for linking to multi-threaded Arm performance libraries, use the following commands:

Compiler Command
armclang -armpl=parallel
gcc -larmpl_lp64_mp

8-byte integer variables

To use 8-byte integer variables in C or Fortran, compile and link your code with the following commands:

Compiler Command
armclang -armpl=ilp64       (Compile and link)
gcc -DINTEGER64          (Compile)
-larmpl_ilp64    (Link)

OpenMP and 64-bit integers

To use both OpenMP and 64-bit integers, use the following link flags:

Compiler Command
armclang -armpl=ilp64,parallel   (Compile and link)
gcc -DINTEGER64                   (Compile)
-larmpl_ilp64_mp   (Link)

Selecting target architecture

When using the Arm compiler, you can use the -mcpu flag to select which library to link with. There are four options: 

  • -mcpu=native allows the compiler to infer from the host system which libraries to use.

  • -mcpu=thunderx2t99 selects libraries tuned for Cavium ThunderX2 cores.

  • -mcpu=cortex-a72 selects libraries tuned for Cortex-A72 cores.

  • -mcpu=generic selects libraries that work on any AArch64 core.

Linking against static libraries

Use -static to link against the static rather than shared libraries.

Compiling and testing the full suite of examples

The examples directory contains the following:

  • A GNUmakefile to build and execute all of the example programs.

  • A number of different C examples, *.c.

  • A number of different Fortran examples, *.f90.

  • Expected output for each example, *.expected.

The makefile compiles and runs each example, and compares the generated output to the expected output. Any differences are flagged as errors.

To compile the examples and run the tests, use the following command:

make 

The makefile that uses Arm Compiler for HPC produces output similar to the following sample:

Compiling program armplinfo.f90:
armflang -c armplinfo.f90 -o armplinfo.o -armpl
Linking program armplinfo.exe:
gfortran -armplinfo.o -o armplinfo.exe -armpl -mcpu=thunderx2t99 -lm
Running program armplinfo.exe: 
(export LD_LIBRARY_PATH='/opt/arm/armpl-19.0.0_Generic-AArch64_SUSE-12_aarch64-linux/lib:'; ./armplinfo.exe > armplinfo.res 2>&1) 
ARMPL (ARM Performance Libraries) 

... 

Testing: no example difference files were generated.
Test passed OK

In-tool documentation

The  Arm Performance Libraries Reference Manual is available on the Arm Developer website. The /Doc subdirectory within the Arm Performance Libraries installation directory (${ARMPL_DIR}/Doc) contains the reference manual in the following formats:

  • PDF format: arm_performance_libraries_reference_manual_<document_version>.pdf. 

  • Info pages: armpl.info

  • Plain text: armpl.txt

You can read the armpl.info file using one of the following steps:

  1. Update the INFOPATH environment variable to include the ${ARMPL_DIR}/Doc subdirectory by loading the Arm Performance Libraries environment module (which automatically sets the INFOPATH variable), or by manually setting the variable using:

    setenv INFOPATH ${INFOPATH}:${ARMPL_DIR}/Doc

    Then, open the file using info:

    info armpl.info
  2. Use info and specify the full path to the file:

    info ${ARMPL_DIR}/Doc/armpl.info

The /Doc subdirectory also contains a USAGE.txt plain text file containing instructions on using Arm Performance Libraries with gfortran and with GCC, and includes examples.

Related information