Energy profiling for developers
Arm MAP performance profiler is part of the Arm Forge development tool suite for high performance code development, debugging, profiling and optimization.
The Energy Pack add-on for Arm MAP allows you to evaluate the energy usage and run time duration of your applications while you are developing them.
Arm MAP shows the following measurements over time from an application run:
- Node power usage, which shows the physical system power usage per node, and across all nodes in an application (see below, Systems compatible with the Arm Energy Pack).
- CPU power usage, which shows the cumulative power consumption of all CPUs on the node, measured by the Intel on-board sensor (Intel RAPL).
- GPU power usage, which shows the cumulative power consumption of all GPUs on the node, measured by the NVIDIA on-board sensor.
You can use Arm MAP to zoom in to specific areas of the application run to examine the impact of optimization. The following example from Arm MAP, with the Energy Pack add-on, shows that processor level CPU (green) power usage dips every time I/O (dark orange) is in progress.
Note: Evaluate the impact of optimization by observing the total duration of an application run. Lower power usage for a longer time can cost more energy than a run at higher power with a shorter duration. In the example, the CPU and communication (blue) do not have the same impact. The example shows that this application spends most of its time on communication. Reducing communication time improves the overall energy usage, even if the peak power usage remains constant and has a greater impact on the total cost.
Tip: Explore options for spreading the I/O more evenly over time.
Consider the following points when you optimize application performance:
- GPU use increases the peak node (system) power usage, but the cumulative energy use is lower if a significantly faster run time results from using the GPU to offload computation.
- Examine the CPU instruction and IO metrics to determine more optimal memory frequency. Arm MAP shows whether your application is primarily waiting on I/O or memory access. If your system supports frequency scaling and your processor is not receiving data fast enough from the I/O system or from memory to supply data to the processor, a reduced processor frequency (leaving memory and I/O speed unchanged) can often reduce runtime.