This feature is available to use only in Arm Forge Professional. Contact Arm Sales at HPCToolsSales@arm.com for details on how to upgrade.
The Perf metrics use the Linux kernel perf_event_open() system call to provide additional CPU related metrics available for MAP. They can be used on any system supported by the Linux perf command (also called perf_event). These cannot be tracked on typical virtual machines.
Note: You cannot use configurable Perf metrics when collecting PAPI metrics, and the following features are disabled when using configurable Perf metrics:
- CPU instruction metrics on Armv8-A (section 24.1.2 ).
- CPU instruction metrics on IBM Power systems (section 24.1.3 and section 24.1.4 ).
Perf metrics count the rate of one or more performance events that occur in a program. There are some software events that the Linux kernel provides, but most are hardware events tracked by the Performance Monitoring Unit (PMU) of the CPU. Generalized hardware events are event name aliases that the Linux kernel identifies.
The quantity and combinations (in some cases) of events that can be simultaneously tracked is limited by the hardware. This feature does not support multiplexing performance events.
If the set of events you requested can not be tracked at the same time, MAP ends the profiling session immediately with an error message. Try requesting fewer events, or a different combination. See the PMU reference manual for your architecture for more information on incompatible events.
On some systems, using Perf hardware counters can be restricted by the value of /proc/sys/kernel/perf_event_paranoid.
Disable use of Perf events
Allow only user-space measurements
Allow kernel and user-space measurements
Allow access to CPU-specific data, but not raw tracepoint samples.
The value of /proc/sys/kernel/perf_event_paranoid must be 2 or lower to collect Perf metrics. To set this until the next reboot, run the following commands:
To permanently set the paranoid level, add the following line to /etc/sysctl.conf:
You must probe an example of a typical host machine before using these metrics. As well as other properties, this collects the CPU ID used to identify the set of potential hardware events for the host, and tests which generalized events are supported.
Ensure that /proc/sys/kernel/perf_event_paranoid is set to 2 or lower (Permissions ) before performing the probe.
It is not necessary to probe every potential host, a single compute node in a homogeneous cluster is sufficient.
If your home directory is writable, you can generate a probe file and install it in your config directory by running the following on the intended host:
If the Forge installation directory is writable, you can generate and install the probe file for the current host with:
To generate the probe file, but install it manually, execute:
The probe is named <hostname>_probe.json and is generated in your current working directory. You must manually copy it to the location specified in the forge-probe output. This is typically only necessary when the compute node that you are probing does not have write access to your home file system.
Check that the expected probe files are correctly installed with:
This shows something like:
0x00000000420f5160 (thunderx2) e.g. node07.myarmhost.com
GenuineIntel-6-4E (skylake) e.g. node01.myintelhost.com
If you have exactly one probe file installed, this is automatically assumed to be the target host. If there are multiple installed probe files, you must specify the intended target whenever you use the configurable Perf metrics feature. When using the command line, use the --target-host argument. You can specify the intended target CPU ID (such as, 0x00000000420f5160), family name (such as, thunderx2), or a unique substring of the hostname (myarmhost).
You can list available events for a given probed host using:
Use list instead of avail to see the events listed on separate lines.
Specify the events you want using a semicolon separated list:
/path/to/forge/bin/map --profile --target-host=myarmhost \
--perf-metrics="cpu-cycles; bus-cycles; instructions" mpirun ...
The --perf-metrics argument can also take the name of a plain text file:
myevents.txt lists the events to track on separate lines, such as:
--perf-metrics=template outputs a more complex template that lists all possible events with accompanying descriptions. Redirect this output to a file and uncomment the events to track, for example:
/path/to/forge/bin/map --target-host=myhost \
--perf-metrics=template > myevents.txt
/path/to/forge/bin/map --profile --perf-metrics=myevents.txt \
- Click Configure Perf metrics in the Run
window to open the Perf metrics configuration window.
- Select the target host from the list of installed hosts (see 25.2 ) in the drop-down menu at the top.
- Double-click an event, or use the arrow buttons to add or remove events from this list. Note: On the left of the window is the list of Perf events available on the currently selected host, and on the right is the list of events you have selected for tracking.
- Filter the list of available events by typing a
substring of characters in the Filter box.
the bottom of the window displays a preview of the section of the command line with the --perf-metrics= command, based on the currently selected list of events.
- In the File menu, open the perf metric selection dialog to help you construct a suitable --perf-metrics= command line without starting a job, that you can copy into a queue submission script.
You can view Perf event counts in the Metrics view (24 ) under the Linux Perf CPU events preset. All these metrics are reported as events per second with a suitable SI prefix (such as, K, M, G) that is automatically determined.
You can override the default settings used by MAP when making perf_event_open calls. Specify one or more flags in a preamble section in square brackets at the start of the perf metrics definition string (either on the command line or at the top of a template file).
/path/to/forge/bin/map --profile --target-host=myarmhost \
--perf-metrics="[optional,noinherit]; instructions; cpu-cycles"
Possible options are:
- [optional]: Do not abort the program if the requested metrics cannot be collected. Set this if you wish to continue profiling even if the no Perf metric results is returned.
- [noinherit]: Disable multithreading support (new threads will not inherit the event counter configuration). If you specified events, they are only collected on the main thread (in the case of MPI programs, the thread that called MPI_thread_init).
- [nopinned]: Disable pinning events on the PMU. If you have specified this, event counting might be multiplexed. Arm does not recommend doing this as it interacts poorly with the Forge sampling strategy.
- [noexclude=kernel]: Do not exclude kernel events that happen in kernel space. This might require a more permissive perf_event_paranoid level.
- [noexclude=hv]: Do not exclude events that happen in the hypervisor. This is mainly for PMUs that have built-in support for handling this (such as IBM Power). Most machines require extra support for handling hypervisor measurements.
- [noexclude=idle]: Do not exclude counting software events when the CPU is running the idle task. This is only relevant for software events.