You copied the Doc URL to your clipboard.

2 Installation

A release of Arm Performance Reports can be downloaded from the Arm Developer website.

Both a graphical and text-based installer are provided. See the following sections for details.

2.1 Linux installation

2.1.1 Graphical install

Untar the package and run the installer executable, using:

   tar xf arm-reports-18.2.1-<distro>-<arch>.tar 
cd arm-reports-18.2.1-<distro>-<arch>

replacing <distro> and <arch> with the OS distribution and architecture of your tar package, respectively. For example, the tarball package for Redhat 7.4 OS and Armv8-A (AArch64) architecture is: arm-reports-18.2.1-Redhat-7.4-aarch64.tar

The installer consists of a number of pages where you can choose install options. Use the Next and Back buttons to move between pages or Cancel to cancel the installation.

The Install Type page allows you to choose which user(s) to install Arm Performance Reports for.

If you are an administrator (root) you can install Arm Performance Reports for All Users in a common directory, such as /opt or /usr/local, otherwise only the Just For Me option is enabled.


Figure 1: Arm Performance Reports Installer-Installation type

Once you have selected the installation type, you are prompted to specify the directory you would like to install Arm Performance Reports in. For a cluster installation, choose a directory that is shared between the cluster login or frontend node and the compute nodes. Alternatively, install it on or copy it to the same location on each node.


Figure 2: Arm Performance Reports Installer-Installation directory

You are shown the progress of the installation on the Install page.


Figure 3: Install in progress

Arm Performance Reports does not have a GUI and does not add any desktop icons.

It is important to follow the instructions in the README file that is contained in the tar file. In particular, you need a valid license file. Use the following link to obtain an evaluation license Get software.

Due to the large number of different site configurations and MPI distributions that are supported by Arm Performance Reports, it is inevitable that you may need to take further steps to get everything fully integrated into your environment. For example, it may be necessary to ensure that environment variables are propagated to remote nodes, and ensure that the tool libraries and executables are available on the remote nodes.

2.1.2 Text-mode install

The text-mode install script is useful if you are installing remotely.

To install using the text-mode install script, untar the package and run the script, using:

   tar xf arm-reports-18.2.1-<distro>-<arch>.tar 
cd arm-reports-18.2.1-<distro>-<arch>

replacing <distro> and <arch> with the OS distribution and architecture of your tar package, respectively. For example, the tarball package for Redhat 7.4 OS and Armv8-A (AArch64) architecture is: arm-reports-18.2.1-Redhat-7.4-aarch64.tar

Next, you are prompted with the license agreement. To read the license, press Return. Following the license prompt, you are requested to enter the directory where you want to install Arm Performance Reports. This directory must be accessible on all the nodes in your cluster. Enter a directory for the installation.

Alternatively, to run the text-mode install script, accept the license, and point to an installation directory in one step, pass the arguments --accept-licence and <installation_directory> when executing For example:

    ./ --accept-licence <installation_directory>

replacing the <installation_directory> with a directory of your choice.

2.2 License files

Arm Performance Reports requires a license file for its operation.

Time-limited evaluation licenses are available from the Arm Developer website.

2.3 Workstation and evaluation licenses

Workstation and Evaluation license files for Arm Performance Reports do not require Arm Licence Server and should be copied directly to {installation-directory}/licences, for example, /home/user/arm/reports/licences/Licence.ddt. Do not edit the files as this prevents them from working.

You may specify an alternative location of the license directory using an environment variable: ALLINEA_LICENCE_DIR. For example:

   export ALLINEA_LICENCE_DIR=${HOME}/SomeOtherLicenceDir

2.4 Supercomputing and other floating licenses

Licensing!Floating licenses

For users with Supercomputing and other floating licenses, the Arm Licence Server must be running on the designated license server machine prior to running Arm Performance Reports.

The Arm Licence Server and instructions for its installation and usage may be downloaded from the Arm Developer website.

The license server download is on the Arm Forge download page.

A floating license consists of two files: the server license, a file name Licence.xxxx, and a client license file Licence.

The client file should be copied to {installation-directory}/licences, for example, /home/user/arm/reports/licences/Licence.

You need to edit the hostname line to contain the host name or IP address of the machine running the Licence Server.

See the Licence Server user guide for instructions on how to install the server license.

2.5 Architecture licensing

Licenses issued after the release of Arm Performance Reports 6.1 specify the compute node architectures that they may be used with. Licenses issued prior to this release will enable the x86_64 architecture by default.

Existing users for other architectures will be supplied with new

2.5.1 Using multiple architecture licenses

If you are using multiple license files to specify multiple architectures, it is recommended that you leave the default licenses directory empty. Instead, create a directory for each architecture, and when you target a specific architecture set ALLINEA_LICENSE_DIR to the relevant directory. Alternatively, you can set ALLINEA_LICENSE_FILE in order to specify the license file.

By way of example, consider a site where there are two target architectures, x86_64 and aarch64. Create two directories, licenses_x86_64 and licenses_aarch64. Then, if you want to target aarch64, you would set the license directory as follows:

   export ALLINEA_LICENSE_DIR=/path/to/licenses_aarch64

2.6 Environment variables

2.6.1 Report customization

Environment variables to customize your reports:


Any text in this environment variable will be included in all reports produced.


Allows you to specify a .map file when using the --dcim-output argument.


Path to the script to use to communicate with DCIM. Default is ${ALLINEA_TOOLS_PATH}/performance-reports/ganglia-connector/pr-dcim.


Path to the gmetric instance to use. This is specific to the pr-dcim script. Default is which gmetric.

2.6.2 Warning suppression

Environment variables for warning suppression (for use when autodetection is resulting in erroneous messages):


Do not attempt to auto-detect MPI or CUDA executables.


Automatically detect Cray MPT by passing --version to the aprun wrapper and parsing the output.

2.6.3 I/O behavior

Environment variables for handling default I/O behavior:


Never forward the stdin of the perf-report command stdin to the program being analyzed, even if not using the GUI. Normally Arm Performance Reports only forwards stdin when running without the GUI.


Enables the option in Arm Performance Reports to generate all types of results at once, using the .all extension.

2.6.4 Licensing

Environment variables to handle licensing:


Location of the license file. Default is ${ALLINEA_TOOLS_PATH}/Licence


Location of the license file. This ensures the license file being pointed to is used.


Location of the licenses directory. Default is ${ALLINEA_TOOLS_PATH}/licences.


Specify the host name of the network interface the license is tied to.

2.6.5 Timeouts

Environment variables for handling timeouts:


Do not time out if nodes do not connect after a specified length of time. This may be necessary if the MPI subsystem takes unusually long to start processes.


Length of time (in ms) to wait for a process to connect to the front end.


Length of time (in ms) to wait for MPI_Finalize to end and the program to exit. Default is 300000 (5 minutes). 0 waits forever.

2.6.6 Sampler

Environment variables for handling sampler-related setup, runtime behavior, and backend processing:


Arm Performance Reports takes a sample in each 20 milliseconds period, giving it a default sampling rate of 50Hz. This will be automatically decreased as the run proceeds to ensure a constant number of samples are taken. See ALLINEA_SAMPLER_NUM_SAMPLES.

If your program runs for a very short period of time, you may benefit by decreasing the initial sampling interval. For example, ALLINEA_SAMPLER_INTERVAL=1 sets an initial sampling rate of 1000Hz, or once per millisecond. Higher sampling rates are not supported.

Increasing the sampling frequency from the default is not recommended if there are lots of threads or very deep stacks in the target program because this may not leave sufficient time to complete one sample before the next sample is started.

i{Note: Custom values for ALLINEA_SAMPLER_INTERVAL may be overwritten by values set from the combination of ALLINEA_SAMPLER_INTERVAL_PER_THREAD and the expected number of threads (from OMP_NUM_THREADS). For more information, see ALLINEA_SAMPLER_INTERVAL_PER_THREAD.


To keep overhead low, Arm Performance Reports imposes a minimum sampling interval based on the number of threads. By default, this is 2 milliseconds per thread, thus for eleven or more threads Arm Performance Reports will increase the initial sampling interval to more than 20 milliseconds.

To adjust this behavior set ALLINEA_SAMPLER_INTERVAL_PER_THREAD to the minimum per thread sample time, in milliseconds.

Lowering this value from the default is not recommended if there are lots of threads as this may not leave sufficient time to complete one sample before the next sample is started.


  • Whether OpenMP is enabled or disabled in Arm Performance Reports, the final script or scheduler values set for OMP_NUM_THREADS will be used to calculate the sampling interval per thread (ALLINEA_SAMPLER_INTERVAL_PER_THREAD). When configuring your job for submission, check whether your final submission script, scheduler or the Arm Performance Reports GUI has a default value for OMP_NUM_THREADS.
  • Custom values for ALLINEA_SAMPLER_INTERVAL will be overwritten by values set from the combination of ALLINEA_SAMPLER_INTERVAL_PER_THREAD and the expected number of threads from OMP_NUM_THREADS.


To direct Arm Performance Reports to use a specific wrapper library set ALLINEA_MPI_WRAPPER=<pathofsharedobject>.

Arm Performance Reports ships with a number of precompiled wrappers, when your MPI is supported Arm Performance Reports will automatically select and use the appropriate wrapper.

To manually compile a wrapper specifically for your system, set ALLINEA_WRAPPER_COMPILE=1 and MPICC and run <path to Arm Performance Reports installation>/map/wrapper/build_wrapper.

This generates the wrapper library ~/.allinea/wrapper/libmap-sampler-pmpi-<hostname>.so with symlinks to the following files:

  • ~/.allinea/wrapper/libmap-sampler-pmpi-<hostname>.so.1
  • ~/.allinea/wrapper/libmap-sampler-pmpi-<hostname>.so.1.0
  • ~/.allinea/wrapper/libmap-sampler-pmpi-<hostname>.so.1.0.0.


To direct Arm Performance Reports to fall back to creating and compiling a just-in-time wrapper, set ALLINEA_WRAPPER_COMPILE=1.

In order to be able to generate a just-in-time wrapper an appropriate compiler must be available on the machine where Arm Performance Reports is running, or on the remote host when using remote connect.

Arm Performance Reports will attempt to auto detect your MPI compiler, however, setting the MPICC environment variable to the path to the correct compiler is recommended.


The path of mpirun, mpiexec or equivalent.

If set, ALLINEA\_MPIRUN has higher priority than that set in the GUI and the mpirun found in PATH.


Arm Performance Reports collects 1000 samples per process by default. To avoid generating too much data on long runs, the sampling rate is automatically decreased as the run progresses to ensure only 1000 evenly spaced samples are stored.

You may adjust this by setting ALLINEA_SAMPLER_NUM_SAMPLES=<positiveinteger>.


It is strongly recommended that you leave this value at the default setting. Higher values are not generally beneficial and add extra memory overheads while running your code. With 512 processes, the default setting already collects half a million samples over the job, the effective sampling rate can be very high indeed.


Specifies the number of lines of program output to record in .map files. Setting to 0 will remove the line limit restriction, although this is not recommended as it may result in very large .map files if the profiled program produces lots of output.


The maximum line length for program output that will be recorded in .map files. Lines containing more characters than this limit will be truncated. Setting to 0 will remove the line length restriction. This is not recommended because it may result in very large .map files if the profiled program produces lots of output per line.


To gather data from MPI calls Arm Performance Reports generates a wrapper to the chosen MPI implementation.

By default, the generated code and shared objects are deleted when Arm Performance Reports no longer needs them.

To prevent Arm Performance Reports from deleting these files set ALLINEA_PRESERVE_WRAPPER=1.


If you are using remote launch then this variable must be exported in the remote script.


To prevent Arm Performance Reports from timing the time spent in MPI calls, set ALLINEA_SAMPLER_NO_TIME_MPI_CALLS.


To allow Arm Performance Reports to use /proc/[pid]/smaps to gather memory usage data, set this ALLINEA_SAMPLER_TRY_USE_SMAPS. This is not recommended since it slows down sampling significantly.


To create the MPI wrapper Arm Performance Reports will try to use MPICC, then if that fails search for a suitable MPI compiler command in PATH. If the MPI compiler used to compile the target binary is not in PATH (or if there are multiple MPI compilers in PATH) then MPICC should be set.

2.6.7 Simple troubleshooting

Environment variables for simple troubleshooting:


To print the weights and heuristics used to autodetect which MPI is loaded, set to 1.

Was this page helpful? Yes No