Chapter 11 Caches only considers the effect of the caches within a single processor. The Cortex-A53 and Cortex-A57 processors support coherency management between the different cores in a cluster. This requires address regions to be marked with the correct shareable attribute. These processors permit systems containing multi-core clusters to be built, where coherency can be maintained for data shared between clusters. Such system-level coherency requires a cache coherent interconnect, such as the ARM CCI-400, which implements the AMBA 4 ACE bus specification. See Figure 14.2.
The coherency support in a system depends on hardware design decisions and many possible configurations exist. For example, coherency can only be supported within a single cluster. A dual cluster big.LITTLE system is posssible in which the inner domain includes the cores of both clusters, or a multi-cluster system where the inner domain includes the cluster and the outer domain includes the other clusters. For more information about big.LITTLE systems, see Chapter 16 big.LITTLE Technology.
In addition to hardware, which maintains data coherency between caches, you must be able to broadcast cache maintenance activity performed by code running on one core to other parts of the system. There are hardware configuration signals, sampled at reset, which control whether inner or outer or both cache maintenance operations are broadcast and whether system barrier instructions are broadcast. The AMBA 4 ACE protocol allows signaling of barriers to other masters, so that ordering of maintenance and coherency operations is maintained. The Interconnect logic might require initialization by boot code.
Software must define which address regions are to be used by which group of masters, that is which other masters are sharing this address, by creating appropriate translation table entries. For Normal cacheable regions, this means setting the shareable attribute to one of Non-shareable, Inner Shareable, or Outer Shareable. For non-cacheable regions, the shareable attribute is ignored.
In a multi-core system it is not possible to know whether a specific core has a line covering a particular address in one of its caches (especially where the interconnect features caches, such as CCN-50x).
Maintenance may need to be broadcast to the interconnect. This means that software on one core can issue a cache clean or invalidate operation to an address that might currently be stored in the data cache of a different core that holds the address. When a maintenance operation is broadcast as shown in Figure 14.3, the operation is performed by all the cores in a particular shareability domain.
SMP operating systems typically rely on being able to broadcast cache and TLB maintenance operations. Consider the situation where an external DMA engine is able to modify the contents of external memory.
The SMP operating system running on a particular core does
not know which core has which data. It simply requires an address
range to be invalidated wherever it is in the cluster. If operations
are not broadcast, the operating system must issue the clean or
invalidate operations locally on each core. A
instruction makes a core wait for the broadcasted operation it has
issued to complete. The barrier does not force
operations received by broadcast to complete. For more information
about barrier instructions, see Chapter 13 Memory Ordering.
|I-cache invalidate all to Point of Unification, Inner Shareable||Yes (inner only)|
|I-cache invalidate all to Point of Unification||No[a]|
|I-cache invalidate by address to Point of Unification||Maybe[b]|
|D-cache zero by address||No|
|D-cache invalidate by address to Point of Coherency||Yes|
|D-cache invalidate by Set/Way||No|
|D-cache clean by address to Point of Coherency||Maybe[b]|
|D-cache clean by Set/Way||No|
|D-cache clean by address to Point of Unification||Maybe[b]|
|D-cache clean and invalidate by address to Point of Coherency||Yes|
|D-cache clean and invalidate by Set/Way||No|
[a] Broadcast in Non-secure EL1 if HCR/HCR_EL2 FB bit is set, overriding normal behavior. This bit causes the following instructions to be broadcast within the Inner shareable domain when executed from Non-secure:
EL1: TLBI VMALLE1, TLBI VAE1, TLBI ASIDE1, TLBI VAAE1, TLBI VALE1, TLBI VAALE1, IC IALLU.
[b] Broadcast determined by shareability of memory region
IC instruction, that is the instruction
cache maintenance operation,
IS indicates that the function
applies to all instruction caches within the Inner Shareable domain.