Overview

During the SoC design process, you will usually be interested in understanding what performance you can expect from your system. There are a number of different techniques you can use to analyze performance:

In addition, there are a number of common performance pitfalls that you should be aware of.

Benchmarking processor performance

Benchmarks such as Coremark and Dhrystone provide standard workloads to let you compare the performance of different systems:

Arm publishes benchmark numbers for some processors. Where available, these are provided on the individual processor product pages, for example:

Where hardware devices are not available, Cycle Models can help you confidently make architectural decisions and optimize performance. Cycle Models are compiled directly from Arm RTL and retain complete functional and cycle accuracy.

Analyzing and debugging performance

Some Arm processors provide a Performance Monitoring Unit (PMU) that enables you to gather various statistics on the operation of the processor and its memory system during runtime.

These statistics provide useful information that you can use when debugging or profiling code and could prove useful when assessing the performance and resource efficiency of your system. The following resources will help you understand more about the PMU and how to use it:

Measuring PPA

Power, Performance and Area (PPA) implementation analysis is often used to compare different processors at a high-level. A PPA analysis measures:

  • Power: The power that is consumed by the processor.
  • Performance: The maximum attainable frequency of the clock driving the processor in this specific implementation.
  • Area: How much silicon area the processor occupies.

Often high-level PPA figures are quoted for processors. However, behind the top-level power, performance and area results there are many variables and details that can dramatically alter these figures. Different implementations target different configurations, for example the cache sizes or inclusion of the Floating Point Unit (FPU), and target different goals, for example aiming to achieve the highest possible frequency or the lowest possible area.

The Power, Performance and Area Implementation Analysis White Paper describes, for those without deep processor implementation knowledge, the many variables that should be understood to get real value from any PPA data presented to enable an estimation of the real PPA of your own implementation. This knowledge will allow you to make fair comparisons between processors, both from a single IP partner or between processors from different processor IP vendors.

Avoiding common performance pitfalls

CPU performance is highly dependent on choices such as: processor speed, cache size, interconnect, memory speed, data ordering, data width and optimal integration of the IP blocks.

The following resources aim to highlight some of the common performance pitfalls you might encounter.