Cortex-A53 CPAK Features
The Arm Cortex-A53 processor CPAK introduces the following features:
- Dual-cluster, quad-core Cortex-A53 for a total of 8 cores
- Arm CoreLink CCI-400 providing coherency between clusters
- Fully configured GIC-400 interrupt controller delivering interrupts to all cores
- New Global System Counter connected to A53 Generic Timers
Here is a diagram of the system:
The design also supports fully automatic mapping to Fast Models.
Dual Cluster System
The Cortex-A53 model supports the CLUSTERIDAFF inputs to set the Cluster ID. This value shows up for software in the MPIDR register. Values of 0 and 1 are used for each cluster, and each cluster has four cores. This means that CPU 3 in Cluster 1 has an MPIDR value of 0x80000103 as shown in the screenshot below.
Global System Counter
Another requirement for a multi-cluster system is the use of a Global System Counter. A new model in SoC Designer is connected to the CNTVALUEB input of each Cortex-A53, which ensures that the Generic Timer in each processor has the same counter values for software, even when the frequency of the processors may be different. This model also enables Swap & Play systems to work correctly by saving the counter value from the Fast Model simulation and restoring it in the Cycle Accurate simulation.
Generic Timer to GIC Connections
To create a multi-cluster system the GIC-400 is used as the interrupt controller, and the Cortex-A53 Generic Timers are used as the system timers. This requires the connection of the Generic Timer signals from the Cortex-A53 to the GIC-400. All of these signals start with nCNT and are wired to the GIC. When a Generic Timer generates an interrupt it leaves the CPU by way of the appropriate nCNT signal, goes to the GIC, and then back to the CPU using the appropriate nIRQ signal.
While 64-bit Linux uses nCNTPNSIRQ, all signals are connected for completeness.
Additional signals which fall into the category of power management and connect between the two clusters are EVENTI and EVENTO. These signals are used for event communication using the WFE (wait for event) and SEV (send event) instructions. For a single cluster system all of the communication happens inside the processor, but for the multi-cluster system these signals must be connected.
WFE and SEV communication is used during the Linux boot. All 7 of the secondary cores execute a WFE and wait until the primary core wakes them up using the SEV instruction at the appropriate time. If the EVENTI and EVENTO signals are not connected the secondary cores will not wake up and run.
Boot Wrapper Modifications
All of the software used in the 8-core CPAK is downloadable in source code format. A small boot wrapper is used to take care of starting the cores and doing a minimal amount of hardware configuration that Linux assumes to be already done. Sometimes there is additional hardware programming that is needed for proper cycle accurate operation that is not needed in a Fast Model system.
Although not specific to multi-cluster, the Cortex-A53 contains a bit in the CPUECTLR register named SMPEN which must be set to 1 to enable hardware management of data coherency with the other cores in the cluster. This may not be set in the original boot wrapper from kernel.org and the Linux kernel assumes it is already done so this must be added to the boot wrapper during development.
CCI Snoop Configuration
The enabling of snoop requests and responses between the clusters may also be assumed by the Linux kernel. The Snoop Control Register for each CCI-400 slave ports is set to 0xc0000003 to enable coherency. This must also be added to the boot wrapper during development of the CPAK.
The changes between the boot wrapper functionality and Linux assumptions are provided as a patch file in the CPAK so they can be easily applied to the original source code.
The CPAK comes with an application note which covers the construction of the Linux image. The following items are configured to match the minimal hardware system design, and can be extended as the hardware design is modified.
- File System: Custom file system configured and created using Buildroot
- Kernel Image: Linux 3.14.0 configured to use the minimal hardware
- Device Tree Blob: Based on Versatile Express device tree for Arm Fast Models
- Boot Wrapper: Small assembly boot wrapper available from kernel.org
A single executable file (. axf file) containing all of the above items is compiled. This file contains all of the artifacts and is a single image that is loaded and executed in SoC Designer. and there are no kernel source code changes required. This demonstrates the flexibility of Linux in supporting a wide variety of hardware configurations.
The ability to boot a Linux kernel using Fast Models and migrate the simulation to cycle accurate execution enables system performance analysis for 64-bit multi-core systems running Linux applications. The “Brought up 8 CPUs” message below illustrates the system.
A number of 64-bit Linux applications are provided in the file system, but users can easily add their favorite programs and run them by following the instructions in the app note.