You copied the Doc URL to your clipboard.

25 PAPI metrics

Note

Arm Forge Professional is required to make use of this feature. Please contact Arm Sales at HPCToolsSales@arm.com for details on how to upgrade.

The PAPI metrics are additional metrics available for MAP which use the Performance Application Programming Interface (PAPI). They can be used on any system supported by PAPI.

Note

In this release PAPI metrics will be collected from the main thread only.

Due to the limitations of PAPI, some metrics may be unavailable on your system. MAP displays all available metrics and where metrics are not available error messages are displayed.

As there is a limit on the type and number of events that can be counted together, PAPI metrics have been split up into small groups of compatible events, so that the user can choose which events to view.

To change which group of metrics MAP uses, navigate to the directory indicated on completion of the installation process and modify the PAPI.config file.

25.1 Installation

To use these metrics, download and install PAPI from http://icl.cs.utk.edu/papi/index.html. Then run the metrics installer papi_install.sh from the Arm Forge directory.

25.2 PAPI config file

Once installation has completed, edit the PAPI.config file to set your configuration as required.

By default a template PAPI.config file is provided in your installation directory at /arm_installation_directory/map/metrics. Alternatively, the PAPI.config file can be located inside your configuration directory as set by the ALLINEA_CONFIG_DIR environment variable. By default your configuration directory is \$HOME/.allinea.

To use a PAPI.config file located elsewhere, set and export the ALLINEA_PAPI_CONFIG environment variable to point to your PAPI.config file. For example:

export ALLINEA_PAPI_CONFIG=/opt/arm/map/metrics/PAPI.config.

This needs to be set before running MAP.

If you are using a queuing system, be sure that the ALLINEA_PAPI_CONFIG variable is set and exported to all the compute nodes, by adding the ALLINEA_PAPI_CONFIG export line to the job script before the MAP command line.

The PAPI config file contains all the metrics sets that can be used and the location of it has been indicated at the end of the installation process. The default metric set is Overview. If you want to use another PAPI metrics set, modify the value of the variable called set to the desired PAPI metrics set of either CacheMisses, BranchPrediction or FloatingPoint.

25.3 PAPI overview metrics

This group of metrics gives a basic overview of the program which has been profiled.

DP FLOPS: The number of double precision floating-point operations performed per second. This uses the PAPI_DP_OPS (double precision floating-point operations) event. What it actually counts differs across architectures. Additionally, there are many caveats surrounding this PAPI preset on Intel architectures. See http://icl.cs.utk.edu/projects/papi/wiki/PAPITopics:SandyFlops for more details.

Cycles per instruction: The number of CPU cycles per instruction executed. This uses the PAPI_TOT_CYC (total cycles) and PAPI_TOT_INS (total instructions) events.

L2 data cache misses: The number of L2 data cache misses per second. This uses the PAPI_L2_DCM (L2 data cache misses) event. This metric is only available in this preset if the system has enough hardware counters (5 at least) to collect the required events.

25.4 PAPI cache misses

This group of metrics focuses on cache misses at various levels of cache.

L1 cache misses: The number of L1 cache misses per second. This uses the PAPI_L1_TCM (L1 total cache misses) event, although if this event is unavailable the L1 data cache misses metric (using the PAPI_L1_DCM event) will be displayed instead.

L2 cache misses: The number of L2 cache misses per second. This uses the PAPI_L2_TCM (L2 total cache misses) event, although if this event is unavailable the L2 data cache misses metric (using the PAPI_L2_DCM event) will be displayed instead.

L3 cache misses: The number of L3 cache misses per second. This uses the PAPI_L3_TCM (L3 total cache misses) event, although if this event is unavailable the L3 data cache misses metric (using the PAPI_L3_DCM event) will be displayed instead.

25.5 PAPI branch prediction

This group of metrics focuses on branch prediction instructions.

Branch instructions: The number of branch instructions per second. This uses the PAPI_BR_INS (branch instructions) event.

Mispredicted branch instructions: The number of conditional branch instructions that are mispredicted each second. This uses the PAPI_BR_MSP (mispredicted branch instructions) event.

Completed instructions: The number completed instructions per second. This uses the PAPI_TOT_INS event, and is included to provide context for the above other metrics in this group.

25.6 PAPI floating-point

This group of metrics focuses on floating-point instructions.

Floating-point scalar instructions: The number of scalar floating-point instructions per second. This uses the PAPI_FP_INS event.

Floating-point vector instructions: The number of vector floating-point instructions per second. This uses the PAPI_VEC_SP (single-precision vectorized instructions) and PAPI_VEC_DP (double-precision vectorized instructions) events, although if those events are unavailable the Vector Instructions metric will be displayed instead.

Vector instructions: The number of vector instructions (floating-point and integer) per second. This uses the PAPI_VEC_INS event, but is only displayed if the events needed for the Floating-point vector instructions metric are not available.

Completed instructions: The number completed instructions per second. This uses the PAPI_TOT_INS event, and is included to provide context for the above other metrics in this group.

Was this page helpful? Yes No