Comparing Arm Cortex-A72 and Arm Cortex-A57

This article discusses the Cortex-A72 processor which delivers CPU performance that is 50x greater than leading smartphones from five years ago. The core delivers 3.5x the sustained performance compared to a Cortex-A15 design from 2014.

Introduction

The Cycle Model for the Arm Cortex-A72 processor is now available on Arm IP Exchange along with 10 Cycle Performance Analysis Kits (CPAKs). This tutorial highlights some of the key differences between the Cortex-A72 and the Cortex-A57.

Arm IP Exchange Portal Changes

IP Exchange enables users to configure, build, and download models for Arm IP. There are a few differences between the Cortex-A57 and the Cortex-A72:

  1. The first difference is the L2 cache size. The Cortex-A57 can be configured with 512 KB, 1 MB, or 2 MB L2 cache, but the Cortex-A72 can be configured with a fourth option of 4MB.
  2. The GIC CPU interface can now be disabled for the Cortex-A72 on IP Exchange. Many designs continue to use version 2 of the Arm GIC architecture with IP such as the GIC-400. These designs can take advantage of excluding the GIC CPU interface.
  3. The Cortex-A72 model offers the option to include or exclude the ACP (Accelerator Coherency Port) interface.
  4. The number of FEQ (Fill/Evict Queue) Entries on the Cortex-A72 has been increased to include options of 20, 24, and 28 compared to the Cortex-A57 which offers 16 or 20 entries. This feature is important for Cycle Model users doing performance analysis and studying the impact of various L2 cache parameters.

The Cortex-A72 configuration from IP Exchange is shown below.




ACE Interface Changes

The width of the transaction ID signals has been increased from 6 bits to 7 bits in the Cortex-A72 interface. The wider *IDM signals only apply when the Cortex-A72 is configured with an ACE interface. The main impact occurs when connecting an Cortex-A72 to a CCI-400 which was used with Cortex-A53 or Cortex-A57. Since these CPUs have the 6-bit wide *IDM signals the CCI-400 needs to be reconfigured for 7-bit wide *IDM signals. All of the Cortex-A72 CPAKs which use CCI-400 have this change made to them so they operate properly, however users should be wary of this when upgrading existing systems to Cortex-A72.

This applies to the following signals for Cortex-A72:

  • AWIDM[6:0]
  • WIDM[6:0]
  • BIDM[6:0]
  • ARIDM[6:0]
  • RIDM[6:0]

System Register Changes

A number of system registers are updated with new values to reflect the Cortex-A72.  The primary part number field in the Main ID register (MIDR) for Cortex-A72 is 0xD08 vs the Cortex-A57 value of 0xD07 and the Cortex-A53 value of 0xD03. A number of other ID registers change value from 7 on the Cortex-A57 to 8 on the Cortex-A72.


New PMU Events

A number of new events are tracked by the Cortex-A72 Performance Monitor Unit (PMU). All of the new events have event numbers 0x100 and greater. There are three main sections covering:

  • Branch Prediction
  • Queues
  • Cache

The screenshots below from the Cycle Analyzer show the PMU events. All of these are automatically instrumented by the Cycle Model and are recorded without any software programming.


Carbon Analyzer showing PMU Events - 1


Carbon Analyzer showing PMU events


The Cortex-A72 contains many micro-architecture updates for incremental performance improvement, for example the L2 FEQ size. In a test comparing the Cortex-A57 CPAK and a Cortex-A72 CPAK with the exact same software program, both CPUs reported approximately 21,500 instructions retired. This is the instruction count if the program were viewed as a sequential instruction stream. Both CPUs also do a number of speculative operations; the Cortex-A57 reported about 37,000 instructions speculatively executed and the Cortex-A72 reported 35,700. 

The screenshots of the instruction events are shown below, first Cortex-A72 followed by Cortex-A57. All of the micro-architecture improvements of the Cortex-A72 combine to provide an incredibly high performance CPU.


A72 Instruction Events


A57 Instruction Events



Arm System Exchange

CPAKs for Cortex-A57, Cortex-A53 and Cortex-A72 can be run with various configuration options and directly compare and contrast the performance results using their own software and systems. These are available from Arm System Exchange and provide a starting point for investigating system performance.

This article was originally written as a blog by Jason Andrews. Read the original post on Connected Community.