Migrating ARM Linux from CoreLink CCI-400 Systems to CoreLink CCN-504

The ARMv8 Linux CPAK utilizes ARM CoreLink CCN-504 Cache Coherent Network on ARM System Exchange. This tutorial explains how to migrate a multi-cluster Linux CPAK from CCI to CCN. The CCN family of interconnect offers a wide range of high bandwidth, low latency options for networking and data center infrastructure.

Introduction

The ARMv8 Linux CPAK uses an ARM® Cortex®-A57 processor with octa-core configuration to run Linux on a system with AMBA 5 CHI. Switching the Cortex-A57 configuration from ACE to CHI on ARM IP Exchange requires changing a pull-down menu item on the model build page. After that, a number of configuration parameters must be set to enable the CHI protocol correctly, many of which are discussed in an article covering usage of the CCN-504. Using native AMBA 5 CHI for the CPU interface coupled with the CCN-504 interconnect provides high-frequency, non-blocking data transfers. Linux is commonly used in many infrastructure products such as set-top boxes, networking equipment, and servers so the Linux CPAK is applicable for many of these system designs.

Selecting AMBA 5 CHI for the memory interface makes the system drastically different at the hardware level compared to a Linux CPAK using the ARM CoreLink CCI-400 Cache Coherent Interconnect, however the software stack is not significantly different.

When considering software, a change in interconnect usually requires a change in initial system configuration. It also impacts performance analysis as each interconnect technology has different solutions for monitoring performance metrics. An interconnect change can also impact other system construction issues such as interrupt configuration and connections.

Further details on migrating a multi-cluster Linux CPAK from CCI to CCN are covered below.


Software Configuration

  1. Configuration for the CCN-504 is done using the Linux boot wrapper which runs immediately after reset. The CPAK doesn’t include the boot wrapper source code, but instead uses git to download it from kernel.org and then patch the changes needed for CCN configuration. The added code performs the following tasks:
    • Set the SMP enable bit in the A57 Extended Control Register (ECR)
    • Terminate barriers at the HN-I
    • Enable multi-cluster snooping
    • Program HN-F SAM control registers
  2. A critical software task is to ensure multi-cluster snooping is operational, as without this Linux will not run properly. If designing a new multi-cluster CCN-based system, run a bare metal software program to verify snooping across clusters is working correctly. It is far easier to debug the system with bare metal software, and there are a number of multi-cluster CCN CPAKs available with bare metal software that can be used.
  3. A similar approach can be applied for other hardware specific programming. Many times users have hardware registers that need to be programmed before starting Linux and it’s easy to put this code into the boot wrapper and less error prone compared to using simulator scripts to force register values.
  4. The Linux device tree provided with the CPAK contains a device tree entry for the CCN-504. The device tree entry has a base address which must match the PERIPHBASE parameter on the CCN-504 model. In this case the PERIPHBASE is set to 0x30 which means the address in the device tree is 0x30000000.


All Linux CPAKS come with an application note which provides details on how to configure and compile Linux to generate a single .axf file.


GIC-400 Identification of CPU Accesses

The following method can be applied to get the CPU Core ID and Cluster ID information to the GIC-400 in the CPAK.

  1. The GIC-400 requires the AWUSER and ARUSER bits on AXI be used to indicate the CPU which is making an access to the GIC. A number between 0 and 7 must be driven on these signals so the GIC knows which CPU is reading or writing, but getting the proper CPU number on the AxUSER bits can be a challenge.
  2. In Linux CPAKs with CCI, this is done by the GIC automatically by inspecting the AXI transaction ID bits and then setting the AxUSER bits as input to the GIC-400. Each CPU will indicate the CPU number within the core (0-3) and the CCI will add information about which slave port received the transaction to indicate the cluster.
  3. Users don’t need to add any special components in the design because the mapping is done inside the Cycle Model of the GIC-400 using a parameter called “AXI User Gen Rule”. This parameter has a default value which assumes a 2 cluster system in which each cluster has 4 cores. This is a standard 8 core configuration which uses all of the ports of the GIC-400. The parameter can be adjusted for other configurations as needed. 
  4. The ARM Fast Model for the GIC-400 uses the concept of Cluster ID to indicate which CPU is accessing the GIC, enabling the User Gen Rule to do more. The Cluster ID concept is familiar for software reading the MPIDR register and exists in hardware as a CPU configuration input, but is not present in each bus transaction coming from a CPU and has no direct correlation to the CCI functionality of adding to the ID bits based on slave port.
  5. To create systems which use cycle accurate models and can also be mapped to ARM Fast Models the User Gen Rule includes all of the following information for each of the 8 CPUs supported by the GIC:
    • Cluster ID value which is used to create the Fast Model system
    • CCI Port which determines the originating cluster in the Cycle Accurate system
    • Core ID which determines the CPU within a cluster for both Fast Model and Cycle Accurate systems

With this information Linux can successfully run on multi-cluster systems with the GIC-400.


AMBA 5 CHI Systems

In a system with CHI the Cluster ID and the CPU ID values must also be presented to the GIC in the same way as the ACE systems.

  1. For CHI systems, the CPU will use the RSVDC signals to indicate the Core ID. The new CCN-504 CPAK introduces a SoC Designer component to add Cluster ID information. This component is a CHI-CHI pass through which has a parameter for Cluster ID and adds the given Cluster ID into to the RSVDC bits.
  2. For CCN configurations with AXI master ports to memory, the CCN will automatically drive the AxUSER bits correctly for the GIC-400. For systems which bridge CHI to AXI using the SoC Designer CHI-AXI converter, this converter takes care of driving the AxUSER bits based on the RSVDC inputs. In both cases, the AxUSER bits are driven to the GIC. The main difference for CHI systems is the GIC User Gen Rule parameter must be disabled by setting the “AXI4 Enable Change USER” parameter to false so no additional modification is done by the Cycle Model of the GIC-400.

This article was originally written as a blog by Jason Andrews. Read the original post on Connected Community.