Overview
During the SoC design process, you will probably be interested in understanding what performance you can expect from your system. There are a number of different techniques that you can use to analyze performance:
- Power, Performance and Area (PPA) analysis is often used to compare different processors at a high-level.
- Benchmarks provide standard workloads to compare the performance of different systems.
- The Performance Monitoring Unit (PMU) and event counters that are available in some Arm processors can help you to assess the performance and resource efficiency of your system.
You should also be aware of some common performance pitfalls.
Benchmarking processor performance
Benchmarks such as Coremark and Dhrystone provide standard workloads to let you compare the performance of different systems:
- CoreMark is a modern, sophisticated benchmark that lets you accurately measure processor performance. Application Note 350: CoreMark Benchmarking for Arm Cortex Processors describes how to run CoreMark to obtain stable, reproducible results.
- Dhrystone is an older, simpler benchmark. Application Note 273: Dhrystone Benchmarking for Arm Cortex Processors describes how to compile and run Dhrystone.
Arm publishes benchmark numbers for some processors. Where these numbers are available, these are provided on the individual processor product pages, for example:
If hardware devices are not available, Cycle Models can help you make architectural decisions and optimize performance. Cycle Models are compiled directly from Arm RTL and retain complete functional and cycle accuracy.
Analyzing and debugging performance
Some Arm processors provide a Performance Monitoring Unit (PMU) that enables you to gather various statistics on the operation of the processor and its memory system during runtime.
These statistics provide useful information that you can use when debugging or profiling code. This information might be useful when you are assessing the performance and resource efficiency of your system. The following resources will help you to understand more about the PMU and how to use it:
- The Technical Reference Manual for your processor will provide information about the PMU, if your processor provides one. For example, the Arm Cortex‑A32 Processor Technical Reference Manual contains a chapter describing the PMU.
- System Performance Analysis and the Arm Performance Monitor Unit explains how Cycle Models of Arm CPUs enable system performance analysis by providing access to the PMU. This includes an introduction to using the Cycle Model System Analyzer to automatically gather information on Arm PMU events for bare metal and Linux software loads.
- Using the PMU Event Counters in DS-5 details how to use the PMU and the Event Counters in Arm Development Studio.
Measuring PPA
Power, Performance and Area (PPA) implementation analysis is often used to compare different processors at a high-level. A PPA analysis measures:
- Power: The power that is consumed by the processor.
- Performance: The maximum attainable frequency of the clock that is driving the processor in this specific implementation.
- Area: How much silicon area the processor occupies.
Often high-level PPA figures are quoted for processors. However, behind the top-level power, performance and area results there are many variables and details that can affect these figures. Different implementations target different configurations, for example the cache sizes or inclusion of the Floating Point Unit (FPU). Different implementations also target different goals, for example aiming to achieve the highest possible frequency or the lowest possible area.
If you are not familiar with processor implementation, the Power, Performance and Area Implementation Analysis White Paper describes the variables that you need to understand. This information will help you to get value from any PPA data that presented, so that you can estimate the real PPA of your own implementation. This knowledge will allow you to make fair comparisons between processors, from a single IP partner or between processors from different processor IP vendors.
Avoiding common performance pitfalls
CPU performance is highly dependent on choices such as processor speed, cache size, interconnect, memory speed, data ordering, data width and optimal integration of the IP blocks.
The following resources highlight some of the common performance pitfalls that you might encounter.
- Webinar - Three Tips For Maximizing Your SoC Performance explains the methodologies and analysis techniques used at Arm, and how these techniques link to CPU performance.
- Webinar - How to optimize a system with the latest Arm DynamIQ processors explains the guidance available for the Arm DynamIQ processors, plus the Arm Mali-G72 GPU.
- Design Reviews are a service offered by Arm. In a Design Review, expert engineers visit Arm partners to perform a detailed review of a particular stage of the design cycle. Based on that review, they can provide feedback, best practice advice and propose solutions to any issues identified. The Arm Community blog post Finding design errors before it’s too late shows some real-life Design Reviews.
- The System performance analysis at Arm white paper looks at the methodology and different levels of analysis undertaken during Arm performance system analysis.