Power, performance, and area analysis
Power, Performance, and Area (PPA) analysis is performed on an implementation of a piece of IP. At the point that the analysis is performed, the IP does not exist as a physical product. The purpose of the analysis is to gain a solid idea of how the implementation would be if it was realized. The importance of PPA analysis lies in this fact. PPA analysis gives interested parties the chance to evaluate implementations of a piece of IP before time and money is invested in producing the IP.
Before looking at any specific IP, let’s look at what power, performance, and area mean in the context of a PPA analysis.
With regard to power, the values of interest are measured in watts. The bottom line is that a piece of IP could be very energy efficient, but the IP may still use too much power to be a viable choice for your project. There are two power readings of interest:
- Dynamic power
- Static power
Dynamic power refers to the power that is consumed when the IP clock is running. In the PPA data provided for the Arm Flexible Access program, dynamic power is always expressed as mW/GHz to aid comparisons. With processors, you can assume that:
- A benchmark, for example Dhrystone, was running when the measurements were taken.
- The processor was running at the maximum cycles possible for its clock setting.
Static power or leakage is the power that the IP uses when the clock is stopped but the IP is still powered. In the Arm Flexible Access program PPA data, static power is always expressed in mW to aid comparisons.
Performance refers to the maximum clock frequency that the IP can obtain, in a specific implementation. Performance is measured in MHz or, for more powerful processors, GHz. The value is also known as the target frequency in an implementation. Achieving higher levels of performance increases the area and power usage of a processor. This concept is explored further in Exploring how higher performance can increase the area of the implemented IP.
Caution: In this overview, we use the term frequency to describe the performance relating to the clock frequency of a piece of IP. The term performance is used to describe other things, for example the data throughput in a cache.
Independent benchmarks are also used as an indicator of performance.
Note: Unlike PPA data, benchmark data is independent of implementation, and the figure can be calculated from the most abstract representation of a processor design. Benchmark data is useful for comparing IP without considering implementation details. Depending on the implementations, an earlier processor design can be, when realized, faster than a realized later design. The benchmarks, however, demonstrate that the later design is conceptually more powerful. This means that benchmarks give a different perspective to PPA frequency data.
For processors, three kinds of benchmark are used:
Certain benchmarks are preferable for certain categories of processor. The Arm Flexible Access program PPA data is sensitive to this. If you are likely to compare two processors, you can be sure that the same benchmark was used for both sets of figures. For example, the Spec2k benchmark is used for A-series processors.
When running the benchmarks, Arm uses the industry standard Arm compilers, which our partners can also opt to use. These compilers are not special benchmark compilers designed to give favorable results. If you are comparing any Arm PPA data with data provided for another vendor’s processors, check the compiler that is used to compile the benchmark code.
Benchmarking data example
The following table looks at Dhrystone and Coremark values for the Arm Cortex-M processors that are available with the Arm Flexible Access program.
Table 2‑1 Benchmarking data for Arm Cortex-M processors
With both benchmarks, higher values mean higher performance. The data shows, by removing the implementation from the equation, that low-cost processors, for example the Arm Cortex-M0 or Arm Cortex-M23, are not as powerful as mainstream, rich processors for example the Arm Cortex-M33 or Arm Cortex-M7. As this material explores, the physical implementation of a piece of IP involves many factors. This means that IP with lower benchmarks can, depending on the implementation, achieve a higher target frequency than IP with a higher benchmark score.
For example, an Arm Cortex-M4 implementation can achieve the same target frequency as an Arm Cortex-M7 implementation. This is still the case when both processor implementations are set up to have similar capabilities, for example Floating Point Unit (FPU), Digital Signal Processor (DSP) extension, and no cache. However, the Arm Cortex-M4 implementation will have a much higher static power usage, which reflects the fact that an inherently less powerful design is being pushed. Pushing a piece of IP during implementation often makes meeting the required tradeoffs difficult. Better solutions can often be found by going back to the benchmarks, selecting a different piece of IP, and beginning to research what the IP can offer using PPA analysis data.
With regard to size, the value of interest is measured in mm2 and refers to the total area of silicon that is required to make a physical implementation of the IP. For processors, this includes the logic gates, which are also called standard cells, and the memories, for example the L1 caches.
Sometimes area readings are given as gate count figures. However, these figures are potentially misleading, and they can vary by a large margin depending on the fab process that is used and the maximum achievable frequency of the implementation. For this reason, this Arm Flexible Access program PPA data presents all area figures in mm².