Overview

During the SoC design process, you will probably be interested in understanding what performance you can expect from your system. There are a number of different techniques that you can use to analyze performance:

You should also be aware of some common performance pitfalls.

Benchmarking processor performance

Benchmarks such as Coremark and Dhrystone provide standard workloads to let you compare the performance of different systems:

Arm publishes benchmark numbers for some processors. Where these numbers are available, these are provided on the individual processor product pages, for example:

If hardware devices are not available, Cycle Models can help you make architectural decisions and optimize performance. Cycle Models are compiled directly from Arm RTL and retain complete functional and cycle accuracy.

Analyzing and debugging performance

Some Arm processors provide a Performance Monitoring Unit (PMU) that enables you to gather various statistics on the operation of the processor and its memory system during runtime.

These statistics provide useful information that you can use when debugging or profiling code. This information might be useful when you are assessing the performance and resource efficiency of your system. The following resources will help you to understand more about the PMU and how to use it:

Measuring PPA

Power, Performance and Area (PPA) implementation analysis is often used to compare different processors at a high-level. A PPA analysis measures:

  • Power: The power that is consumed by the processor.
  • Performance: The maximum attainable frequency of the clock that is driving the processor in this specific implementation.
  • Area: How much silicon area the processor occupies.

Often high-level PPA figures are quoted for processors. However, behind the top-level power, performance and area results there are many variables and details that can affect these figures. Different implementations target different configurations, for example the cache sizes or inclusion of the Floating Point Unit (FPU). Different implementations also target different goals, for example aiming to achieve the highest possible frequency or the lowest possible area.

If you are not familiar with processor implementation, the Power, Performance and Area Implementation Analysis White Paper describes the variables that you need to understand. This information will help you to get value from any PPA data that presented, so that you can estimate the real PPA of your own implementation. This knowledge will allow you to make fair comparisons between processors, from a single IP partner or between processors from different processor IP vendors.

Avoiding common performance pitfalls

CPU performance is highly dependent on choices such as processor speed, cache size, interconnect, memory speed, data ordering, data width and optimal integration of the IP blocks.

The following resources highlight some of the common performance pitfalls that you might encounter.