Overview

Arm Performance Libraries provide optimized standard core math libraries for high-performance computing applications on Arm processors. In order to get the best performance on Arm systems, it's important that the compiler selects the correct library.

The following design choices affect which library is required:

  • CPU architecture - Generic, Cortex-A72 or ThunderX2?
  • Should the implementation use OpenMP threading?
  • Use 32-bit or 64-bit integers?

Load the optimum version of Arm Performance Libraries with -armpl

The -armpl option instructs Arm Compiler for HPC to load the optimum version of Arm Performance Libraries for your target architecture and implementation. This option also enables optimized versions of the C mathematical functions declared in the math.h library, tuned scalar and vector implementations of Fortran math intrinsics, and auto-vectorization of mathematical functions (disable this using -fno-simdmath). 

{armclang|armflang} code_with_math_routines{.c|.f} -armpl=<arg1>,<arg2>...

The arguments you can specify are:

lp64

  Use 32-bit integers.  Default unless the -i8 option is specified.

ilp64
  Use 64-bit integers.  Inverse of lp64.  Default if the -i8 option is specified.

sequential
  Use the single-threaded implementation of Arm Performance Libraries.  Default unless the -fopenmp option is specified.

parallel
  Use the OpenMP multi-threaded implementation of Arm Performance Libraries.  Inverse of sequential.  Default if the -fopenmp option is specified.

Default behavior

If specified with no arguments, the default behavior of -armpl will depend on whether you have specified -i8 and or -fopenmp:

  • Enabling OpenMP threading with -fopenmp:
    If you have specified -fopenmp and -armpl, the compiler will assume that you require the OpenMP multi-threaded implementation of Arm Performance Libraries, and the default value for -armpl will be parallel.
  • Specifying integer size with -i8:
    If you have specified -i8 and -armpl, the compiler will assume that you require the 64-bit integer implementation of Arm Performance Libraries, and the default value for -armplwill be ilp64

Setting the architecture with -mcpu

Arm Performance Libraries provides libraries suitable for a range of supported CPUs. If you intend to use -armpl, you must also specify the required architecture using the -mcpu option.

By default the compiler will auto-detect the CPU architecture from the build compiler. To change this behavior, and specify a particular architecture, use the -mcpu option.

-mcpu={native | generic | thunderx2c99 | cortex-a72}

Examples

  • To specify a 64-bit integer OpenMP multi-threaded implementation for ThunderX2:
    armflang code_with_math_routines.f -armpl=lp64,parallel -mcpu=thunderx2c99

  • To specify a 32-bit integer single-threaded implementation on Cortex-A72:
    armclang code_with_math_routines.c -armpl=lp64,sequential -mcpu=cortex-a72

  • To use the serial, ilp64 ArmPL libraries, optimized for the CPU architecture of the build computer:
    armflang code_with_math_routines.f -i8 -armpl -mcpu=native

  • To use the parallel, lp64 ArmPL libraries, with portable output suitable for any Armv8-A computer:
    armclang code_with_math_routines.c -armpl -fopenmp -mcpu=generic

  • To use the parallel, ilp64 ArmPL libraries, optimized for Cortex-A72 based computers
    armclang code_with_math_routines.c -armpl=parallel,ilp64 -mcpu=cortex-a72

Related information