Using the Arm Performance Monitor Unit (PMU) Linux Driver

This article discusses how to use the Arm Linux PMU driver to gather performance information, including kernel configuration, device tree entry, and obtaining performance information from a Linux application. A Cycle Performance Analysis Kit (CPAK) for a Cortex-A53 system running 64-bit Linux has been used.

Introduction

The Linux kernel provides an Arm PMU driver for counting events such as cycles, instructions, and cache metrics. This article explains how to use the Arm Linux PMU driver to gather performance information, using a device driver and a system call.

The steps covered are:

  • Configure Linux kernel for profiling
  • Confirm the device tree entry for the Arm PMU driver is included in the kernel
  • Insert system calls into the Linux application to access performance information

Kernel Configuration

  1. Begin by enabling profiling in the Linux kernel. While it isn't always easy to identify the minimal set of values to enable kernel features, for this example enable “Kernel performance events and counters” which is found under “General setup" and "Kernel Performance Events and Counters”.

  2. Kernel Configuration


  3. Enable “Profiling support” on the “General setup” menu.

  4. Kernel Configuration - Profiling Support


  5. Once these options are enabled recompile the kernel as usual by following the instructions provided in the CPAK.

Device Tree Entry

  1. Below is the device tree entry for the PMU driver. All Arm Linux CPAKs for Arm Cortex-A53 and Cortex-A57 processors include this entry so no modification is needed. If you are working with your own Linux configuration confirm the PMU entry is present in the device tree. 
  2. Device Tree Entry

     

  3. When the kernel boots the driver prints out a message:
  4. hw perfevents: enabled with arm/armv8-pmuv3 PMU driver, 7 counters available

    If this message is not in the kernel boot log check both the PMU driver device tree entry and the kernel configuration parameters listed above. If any of them are not correct the driver message will not appear.


Performance Information from a Linux Application

The perf_event_open system call can be used to obtain performance information from a Linux application. This system call does not have a glibc wrapper so it is called directly using syscall. Most of the available examples create a wrapper function, including the one shown in the manpage to make for easier usage.


Perf Event Open System Call


  1. The process is similar to many other Linux system calls. First, get a file descriptor using open() and then use the file descriptor for other operations such as ioctl() and read().
  2. The perf_event_open system call uses a number of parameters to configure the events to be counted. Sticking with the simple case of instruction count, fill in the perf_event_attr data structure with the desired information, including information about:
    • Start enabled or disabled
    • Trace child processes or not
    • Include hypervisor activity or not
    • Include kernel activity or not

    Other system call arguments include which event to trace (such as instructions), the process id to trace, and which CPUs to trace on.

    A setup function to count instructions could look like this:


    Perf Setup Function


  3. At the end of the test or interesting section of code, the instruction count can be disabled to obtain the current value. In this code example, get_totaltime() uses a Linux timer to time the interesting work and this is combined with the instruction count from the PMU driver to print some metrics at the end of the test.

Perf Final


Summary

The Arm PMU driver and perf_event_open system call provide a far more robust solution for accessing the Arm PMU from Linux applications. The driver takes care of all of the accounting, event counter overflow handling, and provides many flexible options for tracing.

For situations where tracing many events is required, it may be overly cumbersome to use the perf_event_open system call. One of the features of perf_event_open is the ability to use a group file descriptor to create groups of events with one group leader and other group members with all events being traced as a group. While all of this is possible it may be helpful to look at the perf command, which comes with the Linux kernel and provides the ability to control the counters for entire applications.

This article was originally written as a blog by Jason Andrews. Read the original post on Connected Community.