Getting Started with Arm Performance Libraries

This tutorial guides you through dynamically modifying your user environment on Linux using environment modules and then performing an example program compilation and run.

Overview

Arm Performance Libraries provide optimized standard core math libraries for high-performance computing applications on Arm processors. The library routines, which are available via both Fortran and C interfaces, include:

  • BLAS - Basic Linear Algebra Subprograms (including XBLAS, the extended precision BLAS).
  • LAPACK - a comprehensive package of higher level linear algebra routines.
  • FFT - a set of Fast Fourier Transform routines for real and complex data.

Arm Performance Libraries are built with OpenMP across many BLAS, LAPACK and FFT routines in order to maximize your performance in multi-processor environments.

Installation

Arm Performance Libraries is installed as part of Arm Compiler for HPC and requires a license. Refer to Installing Arm Compiler for HPC for details on how to perform the installation, and Arm Allinea Studio licensing for details on how to obtain and install your license.

Note: To use Arm Performance Library functions in your code, you must include the header file <armpl.h>. This header file is located in /opt/arm/<armpl_dir>/include/, or <install_dir>/<armpl_dir>/include/ if you have installed to a different location than the default.

Environment configuration

Your administrator should have already installed Arm Performance Libraries as described in Installing Arm Compiler for HPC, and made the environment module available, as described in Environment configuration. Use the following steps to load the Arm Performance Libraries module:

  1. To see which environment modules are available:

    module avail

    Note: you may need to configure the MODULEPATH environment variable to include the installation directory:

    export MODULEPATH=$MODULEPATH:/opt/arm/modulefiles/
  2. Load the appropriate Arm Performance Library module for the processors in your system, and for the compiler you are using:

    module load <architecture>/<linux_variant>/<linux_version>/<compiler_version>/armpl/<armpl_version>

    For example:

    module load Generic-AArch64/SUSE/12/arm-hpc-compiler-18.4/armpl/18.4

    If you prefer to use the gcc compiler, ensure you load the correct module:

    module load Generic-AArch64/SUSE/12/gcc-7.1.0/armpl/18.4

    Note: You might want to consider adding the module load command to your .profile to run it automatically every time you log in.

  3. You can check your environment by examining the LD_LIBRARY_PATH variable. It should contain the appropriate library directories from /opt/arm, as installed in the previous section:

    echo $LD_LIBRARY_PATH /opt/arm/armpl-.../lib:/opt/arm/gcc-.../lib64:/opt/arm/gcc-.../lib

Compiling an example using Arm Performance Libraries

Arm Performance Libraries include a number of example programs to compile and run. The examples are located in /opt/arm/<armpl_dir>/examples/, or <install_dir>/<armpl_dir>/examples/ if you have installed to a different location than the default.

Example

Let's take a look at the dzfft_c_example.c example. This example does the following:

  • Creates an array x[] of double-precision floating-point data.
  • Calls the Arm Performance Library function dzfft to perform a "real-to-Hermitian" transform on the array, and outputs the resulting array.
  • Conjugates the complex output and then calls the Arm Performance Library function zdfft to perform the inverse "Hermitian-to-real" transform on the array.
  • Outputs the original and restored values in the array (they should be identical).

/* dzfft Example Program Text */ 
/*
* ARMPL version 18.4 Copyright ARM,NAG 2018
*/

#include <armpl.h>
#include <stdio.h>
#include <stdlib.h>

int main(void)
{
#define NMAX 20
int i, info, j, n;
double x[NMAX], xx[NMAX];
double *comm;

printf("ARMPL example: FFT of a real sequence using dzfft\n");
printf("-------------------------------------------------\n");
printf("\n");

/* The sequence of double-precision floating-point data */
n = 7;
x[0] = 0.34907;
x[1] = 0.54890;
x[2] = 0.74776;
x[3] = 0.94459;
x[4] = 1.13850;
x[5] = 1.32850;
x[6] = 1.51370;
for (i = 0; i < n; i++)
xx[i] = x[i];

/* Allocate communication work array */
comm = (double *)malloc((3*n+100)*sizeof(double));

/* Initialize communication work array */
dzfft(0,n,x,comm,&info); /*

Compute a real --> Hermitian transform */
dzfft(1,n,x,comm,&info);

printf("Components of discrete Fourier transform:\n");
printf("\n");
for (j = 0; j < n; j++)
printf("%4d %7.4f\n", j, x[j]);

/* Conjugate the Vector X to simulate inverse transform */
for (j = n/2+1; j < n; j++) x[j] = -x[j];

/* Compute the complex Hermitian --> real transform */
zdfft(2,n,x,comm,&info);

printf("\n");
printf("Original sequence as restored by inverse transform:\n");
printf("\n"); printf(" Original Restored\n");
for (j = 0; j < n; j++)
printf("%4d %7.4f %7.4f\n", j, xx[j], x[j]);

free(comm);
return 0;
}

To compile this example:

  1. Compile the source dzfft_c_example.c to generate an object file:

    armclang -c -I<install_dir>/include dzfft_c_example.c -o dzfft_c_example.o

    Note: Replace armclang with gcc to use the gcc compiler instead.

  2. Link the object code into an executable:

    armclang dzfft_c_example.o -L<install_dir>/lib -larmpl_lp64 -o dzfft_c_example.exe -lflang -lflangrti -lm

    Or, to use gcc:

    gcc dzfft_c_example.o -L<install_dir>/lib -larmpl_lp64 -o dzfft_c_example.exe -lgfortran -lm

    The linker and compiler options are:

    • -L<install_dir>/lib adds the Arm Performance Libraries location to the library search path.
    • -larmpl_lp64 links against the Arm Performance Libraries.
    • -lflang -lflangrti links against the Fortran runtime libraries. This is required because the Arm Performance Libraries include Fortran code. When using gcc, link against -lgfortran.
    • -lm links against the standard math libraries.
  3. Run the executable on your Arm system:

    ./dzfft_c_example.exe

    The executable produces output as follows:

ARMPL example: FFT of a real sequence using dzfft
-------------------------------------------------
Components of discrete Fourier transform:

0 2.4836
1 -0.2660
2 -0.2577
3 -0.2564
4 0.0581
5 0.2030
6 0.5309

Original sequence as restored by inverse transform:

Original Restored
0 0.3491 0.3491
1 0.5489 0.5489
2 0.7478 0.7478
3 0.9446 0.9446
4 1.1385 1.1385
5 1.3285 1.3285
6 1.5137 1.5137

Optimized math routines - libamath

The libamath library, in the /opt/arm/<armpl_dir>/lib directory contains optimized versions of exppow and log functions in single and double precision.

If you have calls to exp()pow()log()expf()powf() or logf() in your code, then linking in libamath will use the optimized Arm implementation, resulting in improved performance:

armclang code_with_math_routines.c -L${ARMPL_LIBRARIES} -lamath
armflang code_with_math_routines.f -L${ARMPL_LIBRARIES} -lamath

Advanced options

If you're using advanced features of the Arm Performance Libraries, there are some additional options you can use:

OpenMP

To use OpenMP, use -larmpl_lp64_mp to link against the multi-threaded Arm performance libraries.

8-byte integer variables

To use 8-byte integer variables in C or Fortran, compile your code with -DINTEGER64 and then link to -larmpl_ilp64.

OpenMP and 64-bit integers

To use both OpenMP and 64-bit integers, use the following link flag: -larmpl_ilp64_mp.

Linking against static libraries

Use -static to link against the static rather than shared libraries.

 

Compiling and testing the full suite of examples

The examples directory contains the following:

  • A GNUmakefile to build and execute all of the example programs.
  • A number of different C examples, *.c.
  • A number of different Fortran examples, *.f90.
  • Expected output for each example, *.expected.

The makefile compiles and runs each example, comparing the generated output to the expected output. If there are differences, these are flagged as errors.

To compile the examples and run the tests:

make 

The following output is for a gcc-compiled library; the makefile for a library compiled with the Arm Compiler for HPC would use that compiler:

Compiling program armplinfo.f90:
gfortran -c -I/opt/arm/armpl-18.4.0_Generic-AArch64_SUSE-12_aarch64-linux/include -lgfortran -lm armplinfo.f90 -o armplinfo.o
Linking program armplinfo.exe:
gfortran -lgfortran -lm armplinfo.o -L/opt/arm/armpl-18.4.0_Generic-AArch64_SUSE-12_aarch64-linux/lib -larmpl -o armplinfo.exe
Running program armplinfo.exe:
(export LD_LIBRARY_PATH='/opt/arm/armpl-18.4.0_Generic-AArch64_SUSE-12_aarch64-linux/lib:'; ./armplinfo.exe > armplinfo.res 2>&1)
ARMPL (ARM Performance Libraries)

...

Testing: no example difference files were generated.
Test passed OK

In-tool documentation

The  Arm Performance Libraries Reference Manual is available on the Arm Developer website. Additionally, the /Doc subdirectory within the Arm Performance Libraries installation directory, ${ARMPL_DIR}/Doc, contains this manual in the following formats:

  • PDF format: arm_performance_libraries_reference_manual_<document_version>.pdf. 
  • Info pages: armpl.info
  • Plain text: armpl.txt

To read the armpl.info file, either:

  1. Update the INFOPATH environment variable to include the ${ARMPL_DIR}/Doc subdirectory. To do so, load the Arm Performance Libraries environment module (which automatically sets the INFOPATH variable), or manually set the variable using:

    setenv INFOPATH ${INFOPATH}:${ARMPL_DIR}/Doc

    Then, open the file using info:

    info armpl.info
  2. Read the manual using info and specify the full path to the file:

    info ${ARMPL_DIR}/Doc/armpl.info

The /Doc subdirectory also contains a USAGE.txt plain text file containing instructions on using Arm Performance Libraries with gfortran and with GCC, including examples.

Related information