You copied the Doc URL to your clipboard.

7 Configurable Perf metrics

The Perf metrics use the Linux kernel perf_event_open() system call to provide additional CPU related metrics available for Performance Reports. They can be used on any system supported by the Linux perf command (also called perf_event). These cannot be tracked on typical virtual machines.

Perf metrics count the rate of one or more performance events occurring in a program. There are some software events provided by the Linux kernel but most are hardware events tracked by the Performance Monitoring Unit (PMU) of the CPU. Generalized hardware events are event name aliases that the Linux kernel identifies.

The quantity and combinations (in some cases) of events that can be simultaneously tracked is limited by the hardware. This feature does not support multiplexing performance events.

If the set of events you requested can not be tracked at the same time, Performance Reports ends the profiling session immediately with an error message. Try requesting fewer events, or a different combination. See the PMU reference manual for your architecture for more information on incompatible events.

7.1 Permissions

On some systems, using the Perf hardware counters can be restricted by the value of /proc/sys/kernel/perf_event_paranoid.

perf_event_paranoid

Description

3

Disable use of Perf events

2

Allow only user-space measurements

1

Allow kernel and user-space measurements

0

Allow access to CPU-specific data but not raw tracepoint samples.

-1

No restrictions

The value of /proc/sys/kernel/perf_event_paranoid must be 2 or lower to collect Perf metrics. To set this until the next reboot, run the following commands:


    sudo sysctl -w kernel.perf_event_paranoid=2

To permanently set the paranoid level, add the following line to: /etc/sysctl.conf.


    kernel.perf_event_paranoid=2

7.2 Probing target hosts

You must probe an example of a typical host machine before using these metrics. As well as other properties, this collects the CPU ID used to identify the set of potential hardware events for the host, and tests which generalized events are supported.

Ensure that /proc/sys/kernel/perf_event_paranoid is set to 2 or lower (Permissions ) before performing the probe.

Note: It is not necessary to probe every potential host, a single compute node in a homogeneous cluster is sufficient.

If your home directory is writable you can generate a probe file and install it in your config directory by running the following on the intended host:


  /path/to/forge/bin/forge-probe --install=user

If the Forge installation directory is writable, you can generate and install the probe file for the current host with:


  /path/to/forge/bin/forge-probe --install=global

To generate the probe file, but install it manually, execute:


  /path/to/forge/bin/forge-probe

The probe is named <hostname>_probe.json and is generated in your current working directory. You must manually copy it to the location specified in the forge-probe output. This is typically only necessary when the compute node that you are probing does not have write access to your home file system.

Check that the expected probe files are correctly installed with:


  /path/to/forge/bin/map --target-host=list

This shows something like:

  0x00000000420f5160    (thunderx2)   e.g. node07.myarmhost.com 
GenuineIntel-6-4E (skylake) e.g. node01.myintelhost.com

If you have exactly one probe file installed, this is automatically assumed to be the target host. If there are multiple installed probe files, you must specify the intended target whenever you use the configurable Perf metrics feature. When using the command line, use the --target-host argument. You can specify the intended target CPU ID (such as, 0x00000000420f5160), family name (such as, thunderx2), or a unique substring of the hostname (myarmhost).

7.3 Specifying Perf metrics via the command line

You can list available events for a given probed host using:

  /path/to/forge/bin/perf-report map --target-host=myarmhost \ 
--perf-metrics=avail

Note: Use list instead of avail to see the events listed on separate lines.

Specify the events you want using a semicolon separated list:


  /path/to/forge/bin/perf-report --profile --target-host=myarmhost \ 
--perf-metrics="cpu-cycles; bus-cycles; instructions" mpirun ...

7.4 Specifying Perf metrics via a file

The --perf-metrics argument can also take the name of a plain text file:


  /path/to/forge/bin/perf-report --profile --target-host=myhost \ 
--perf-metrics=./myevents.txt mpirun ...

myevents.txt lists the events to track on separate lines, such as:


cpu-cycles 
bus-cycles
instructions

--perf-metrics=template outputs a more complex template that lists all possible events with accompanying descriptions. Redirect this output to a file and uncomment the events to track, for example:


 /path/to/forge/bin/perf-report --target-host=myhost \ 
--perf-metrics=template > myevents.txt

vim myevents.txt

/path/to/forge/bin/perf-report --profile \
--perf-metrics=myevents.txt mpirun ...

7.5 Viewing events

You can view Perf event counts in the CPU Metrics section. All these metrics are reported as events per second with a suitable SI prefix (such as, K, M, G) that is automatically determined.

The default values that are reported are the mean of means:

  1. The mean value is taken across all processes for each sample (averaging across processes).
  2. The mean value is taken of those per-sample results (averaging across time).

7.6 Advanced configuration

You can override the default settings used by Performance Reports when making perf_event_open calls. Specify one or more flags in a preamble section in square brackets at the start of the perf metrics definition string (either on the command line or at the top of a template file).


 /path/to/forge/bin/perf-report --profile --target-host=myarmhost \ 
--perf-metrics="[optional,noinherit]; instructions; cpu-cycles"

Possible options are:

  • [optional]: Do not abort the program if the requested metrics cannot be collected. Set this if you wish to continue profiling even if the no Perf metric results is returned.
  • [noinherit]: Disable multithreading support (new threads will not inherit the event counter configuration). If you specified events, they are only collected on the main thread (in the case of MPI programs, the thread that called MPI_thread_init).
  • [nopinned]: Disable pinning events on the PMU. If you have specified this, event counting might be multiplexed. Arm does not recommend doing this as it interacts poorly with the Forge sampling strategy.
  • [noexclude=kernel]: Do not exclude kernel events that happen in kernel space. This might require a more permissive perf_event_paranoid level.
  • [noexclude=hv]: Do not exclude events that happen in the hypervisor. This is mainly for PMUs that have built-in support for handling this (such as IBM Power). Most machines require extra support for handling hypervisor measurements.
  • [noexclude=idle]: Do not exclude counting software events when the CPU is running the idle task. This is only relevant for software events.