Text: arm MAP (logo).

Energy profiling with Arm MAP energy pack add-on

The Arm Energy Pack add-on is a development and system administration tool that enables you to optimize applications for energy usage and run time duration. Energy usage varies significantly according to the applications and workloads that you run, and the Energy Pack helps you to evaluate the costs and offers recommendations on how to reduce these.

Download Arm HPC tools here.

Download the IPMI Energy Agent here.

Energy profiling for developers

Arm MAP performance profiler is part of the Arm Forge development tool suite for high performance code development, debugging, profiling and optimization.

The Energy Pack add-on for Arm MAP allows you to evaluate the energy usage and run time duration of your applications while you are developing them.

Arm MAP shows the following measurements over time from an application run:

  • Node power usage, which shows the physical system power usage per node, and across all nodes in an application (see below, Systems compatible with the Arm Energy Pack).
  • CPU power usage, which shows the cumulative power consumption of all CPUs on the node, measured by the Intel on-board sensor (Intel RAPL).
  • GPU power usage, which shows the cumulative power consumption of all GPUs on the node, measured by the NVIDIA on-board sensor.

You can use Arm MAP to zoom in to specific areas of the application run to examine the impact of optimization. The following example from Arm MAP, with the Energy Pack add-on, shows that processor level CPU (green) power usage dips every time I/O (dark orange) is in progress.

Note: Evaluate the impact of optimization by observing the total duration of an application run. Lower power usage for a longer time can cost more energy than a run at higher power with a shorter duration. In the example, the CPU and communication (blue) do not have the same impact. The example shows that this application spends most of its time on communication. Reducing communication time improves the overall energy usage, even if the peak power usage remains constant and has a greater impact on the total cost.
Tip: Explore options for spreading the I/O more evenly over time.
Consider the following points when you optimize application performance:
  • GPU use increases the peak node (system) power usage, but the cumulative energy use is lower if a significantly faster run time results from using the GPU to offload computation.
  • Examine the CPU instruction and IO metrics to determine more optimal memory frequency. Arm MAP shows whether your application is primarily waiting on I/O or memory access. If your system supports frequency scaling and your processor is not receiving data fast enough from the I/O system or from memory to supply data to the processor, a reduced processor frequency (leaving memory and I/O speed unchanged) can often reduce runtime.

Systems compatible with the Arm Energy Pack

  • CPU power measurement requires an Intel CPU with RAPL support, such as Sandybridge and upwards, and including Haswell and Broadwell chips. The intel_rapl powercap kernel module to be loaded. If the module is not loaded, an error is shown in place of the metric.
  • GPU power measurement requires an NVIDIA GPU that supports power monitoring. This can be checked on the command-line with "nvidia-smi -q -d power". If the reported power values are N/A, power monitoring is not supported and the metric shows an error in MAP.
  • Node power monitoring supports the following:
    • IPMI, supported on most modern servers. The Energy Pack has support for IPMI-based power and energy reporting. Install the Arm IPMI energy agent to enable this.
    • Cray HSS energy counters are supported on XK6, XK7, XC30 and XC40 machines.
    • The open API allows integration of other energy monitoring solutions, and work continues with hardware vendors to increase the level of out-of-the-box support.
    • When no node-level power monitoring plugin is available, the following warning message displays:
      "The whole system energy has been calculated using the CPU and Accelerator usage."
      The total energy is the sum of the CPU and accelerator values.

Resources

See the following resources for further information on energy and power profiling, optimization and tuning:

  • InsideHPC - Slidecast about energy and applications. 
  • Scientific Computing World - Article which examines the topic of energy in HPC applications.
  • Watch the Arm slideshare introducing energy and power measurement and frequency scaling.