During the SoC design process, you will usually be interested in understanding what performance you can expect from your system. There are a number of different techniques you can use to analyze performance:
- Power, Performance and Area (PPA) analysis is often used to compare different processors at a high-level.
- Benchmarks provide standard workloads to compare the performance of different systems.
- The Performance Monitoring Unit (PMU) and event counters available in some Arm processors can help you assess the performance and resource efficiency of your system.
In addition, there are a number of common performance pitfalls that you should be aware of.
Benchmarking processor performance
Benchmarks such as Coremark and Dhrystone provide standard workloads to let you compare the performance of different systems:
- CoreMark is a modern, sophisticated benchmark that lets you accurately measure processor performance. Application Note 350: CoreMark Benchmarking for Arm Cortex Processors describes how to run CoreMark to obtain stable, reproducible results.
- Dhrystone is an older, simpler benchmark. Application Note 273: Dhrystone Benchmarking for Arm Cortex Processors describes how to compile and run Dhrystone.
Arm publishes benchmark numbers for some processors. Where available, these are provided on the individual processor product pages, for example:
Where hardware devices are not available, Cycle Models can help you confidently make architectural decisions and optimize performance. Cycle Models are compiled directly from Arm RTL and retain complete functional and cycle accuracy.
Analyzing and debugging performance
Some Arm processors provide a Performance Monitoring Unit (PMU) that enables you to gather various statistics on the operation of the processor and its memory system during runtime.
These statistics provide useful information that you can use when debugging or profiling code and could prove useful when assessing the performance and resource efficiency of your system. The following resources will help you understand more about the PMU and how to use it:
- The Technical Reference Manual for your specific processor will provide information about the PMU, if your processor provides one. For example, the Arm Cortex‑A32 Processor Technical Reference Manual contains a chapter describing the PMU.
- System Performance Analysis and the Arm Performance Monitor Unit explains how Cycle Models of Arm CPUs enable system performance analysis by providing access to the Performance Monitor Unit (PMU). This includes an introduction to using the Cycle Model System Analyzer to automatically gather information on Arm PMU events for bare metal and Linux software loads.
- Using the PMU Event Counters in DS-5 details how to use the Performance Monitoring Unit (PMU) and the Event Counters in Arm DS-5 Development Studio.
Power, Performance and Area (PPA) implementation analysis is often used to compare different processors at a high-level. A PPA analysis measures:
- Power: The power that is consumed by the processor.
- Performance: The maximum attainable frequency of the clock driving the processor in this specific implementation.
- Area: How much silicon area the processor occupies.
Often high-level PPA figures are quoted for processors. However, behind the top-level power, performance and area results there are many variables and details that can dramatically alter these figures. Different implementations target different configurations, for example the cache sizes or inclusion of the Floating Point Unit (FPU), and target different goals, for example aiming to achieve the highest possible frequency or the lowest possible area.
The Power, Performance and Area Implementation Analysis White Paper describes, for those without deep processor implementation knowledge, the many variables that should be understood to get real value from any PPA data presented to enable an estimation of the real PPA of your own implementation. This knowledge will allow you to make fair comparisons between processors, both from a single IP partner or between processors from different processor IP vendors.
Avoiding common performance pitfalls
CPU performance is highly dependent on choices such as: processor speed, cache size, interconnect, memory speed, data ordering, data width and optimal integration of the IP blocks.
The following resources aim to highlight some of the common performance pitfalls you might encounter.
- Webinar - Three Tips For Maximising Your SoC Performance explains the methodologies and analysis techniques used at Arm, plus how these link to CPU performance.
- Webinar - How to optimize a system with the latest Arm DynamIQ processors explains the guidance available for the Arm DynamIQ processors, plus the new Arm Mali-G72 GPU.
- Design Reviews are a service offered by Arm whereby expert engineers visit our partners to perform a detailed review of a particular stage of the design cycle. Based on that review they can provide feedback, best practice advice and propose solutions to any issues identified. The Arm Community blog post Finding design errors before it’s too late shows some real-life examples of how Design Reviews were applied and saved Arm partners time and money.
- The System performance analysis at Arm white paper looks at the methodology and different levels of analysis undertaken during Arm performance system analysis.