Arm IP Exchange Portal Changes
- The first difference is the L2 cache size. The Cortex-A57 can be configured with 512 KB, 1 MB, or 2 MB L2 cache, but the Cortex-A72 can be configured with a fourth option of 4MB.
- The GIC CPU interface can now be disabled for the Cortex-A72 on IP Exchange. Many designs continue to use version 2 of the Arm GIC architecture with IP such as the GIC-400. These designs can take advantage of excluding the GIC CPU interface.
- The Cortex-A72 model offers the option to include or exclude the ACP (Accelerator Coherency Port) interface.
- The number of FEQ (Fill/Evict Queue) Entries on the Cortex-A72 has been increased to include options of 20, 24, and 28 compared to the Cortex-A57 which offers 16 or 20 entries. This feature is important for Cycle Model users doing performance analysis and studying the impact of various L2 cache parameters.
The Cortex-A72 configuration from IP Exchange is shown below.
ACE Interface Changes
The width of the transaction ID signals has been increased from 6 bits to 7 bits in the Cortex-A72 interface. The wider *IDM signals only apply when the Cortex-A72 is configured with an ACE interface. The main impact occurs when connecting an Cortex-A72 to a CCI-400 which was used with Cortex-A53 or Cortex-A57. Since these CPUs have the 6-bit wide *IDM signals the CCI-400 needs to be reconfigured for 7-bit wide *IDM signals. All of the Cortex-A72 CPAKs which use CCI-400 have this change made to them so they operate properly, however users should be wary of this when upgrading existing systems to Cortex-A72.
This applies to the following signals for Cortex-A72:
System Register Changes
A number of system registers are updated with new values to reflect the Cortex-A72. The primary part number field in the Main ID register (MIDR) for Cortex-A72 is 0xD08 vs the Cortex-A57 value of 0xD07 and the Cortex-A53 value of 0xD03. A number of other ID registers change value from 7 on the Cortex-A57 to 8 on the Cortex-A72.
New PMU Events
A number of new events are tracked by the Cortex-A72 Performance Monitor Unit (PMU). All of the new events have event numbers 0x100 and greater. There are three main sections covering:
- Branch Prediction
The screenshots below from the Cycle Analyzer show the PMU events. All of these are automatically instrumented by the Cycle Model and are recorded without any software programming.
The Cortex-A72 contains many micro-architecture updates for incremental performance improvement, for example the L2 FEQ size. In a test comparing the Cortex-A57 CPAK and a Cortex-A72 CPAK with the exact same software program, both CPUs reported approximately 21,500 instructions retired. This is the instruction count if the program were viewed as a sequential instruction stream. Both CPUs also do a number of speculative operations; the Cortex-A57 reported about 37,000 instructions speculatively executed and the Cortex-A72 reported 35,700.
The screenshots of the instruction events are shown below, first Cortex-A72 followed by Cortex-A57. All of the micro-architecture improvements of the Cortex-A72 combine to provide an incredibly high performance CPU.
Arm System Exchange
CPAKs for Cortex-A57, Cortex-A53 and Cortex-A72 can be run with various configuration options and directly compare and contrast the performance results using their own software and systems. These are available from Arm System Exchange and provide a starting point for investigating system performance.