Getting Started

Hardware-managed cache coherency is a fundamental technology to scale system performance and enable heterogeneous processing. Hardware cache coherency is a requirement to enable big.LITTLE processing that allows the operating system to choose the right processor for the right job. The Arm CoreLink CCI family of products has been designed into many applications including mobile, tablet, digital TV, set top box, automotive, and low-cost infrastructure. In addition to full coherency for processor clusters with AMBA 4 ACE, the CCI family is also used to provide high performance I/O coherency for accelerators and interfaces using AMBA 4 ACE-Lite.

CoreLink CCI-550

  • Full coherency with up to six clusters including big.LITTLE and coherent accelerators
  • Fully coherent GPU support to enable heterogeneous processing with shared virtual memory
  • Higher performance and power efficiency with integrated snoop filter

CoreLink CCI-500

  • Full coherency with up to four clusters including big.LITTLE and coherent accelerators
  • Higher performance and power efficiency with integrated snoop filter



CoreLink CCI-400

  • Full coherency with up to two clusters, essential for big.LITTLE
  • Smallest cache coherent interconnect




Product Comparison


CoreLink CCI-550

CoreLink CCI-500

CoreLink CCI-400

Summary Highly configurable ACE interconnect supporting up to 6 clusters Highly configurable ACE interconnect supporting up to 4 clusters Smallest area, 2 cluster coherent interconnect
Fully coherentACE Slave Interfaces
1-6 1-4 2
Processors
 Up to 24 cores Up to 16 cores
Up to 8 cores
IO CoherentACE-Lite Slave Interfaces
0-6
(maximum 7 ACE and ACE-Lite slave ports)
0-6
(maximum 7 ACE and ACE-Lite slave ports)
1-3
System and DMC Master Interfaces
1-6 memory interfaces
1-3 system
1-4 memory interfaces
1-2 system
1-2 memory interfaces
1 system
Memory map
32-48 bit Physical address
40/44/48 bit DVM
32-44 bit Physical address
40/44/48bit DVM
40 bit Physical address
44 bit DVM
Coherency and Snoop Filter
Integrated snoop filter maintains directory of processor cache contents and reduces CPU snoops Integrated snoop filter maintains directory of processor cache contents and reduces CPU snoops
Broadcast snoop coherency
QoS Integrated QoS mechanisms
Power, Area, Frequency Contact Arm for more information

Hardware Cache Coherency Introduction

Coherency is about ensuring all processors, or bus masters in the system see the same view of memory. For example, if you have a processor which is creating a data structure then passing it to a DMA engine to move, both the processor and DMA must see the same data. If that data were cached in the CPU and the DMA reads from external DDR, the DMA will read old, stale data.

There are three mechanisms to maintain coherency:

  • Disable caching is the simplest mechanism but can cost significant CPU performance. To get the highest performance, processors are pipelined to run at high frequency, and to run from caches that offer a very low latency. Caching data that is accessed multiple times increases performance significantly and reduces DRAM accesses and power. Marking data as “non-cached” could impact performance and power.

  • Software managed coherency is the traditional solution to the data sharing problem. The software, usually device drivers, must clean or flush dirty data from caches, and invalidate old data to enable sharing with other processors or masters in the system. This takes processor cycles, bus bandwidth, and power. 

  • Hardware managed coherency offers an alternative to simplify software. With this solution, any cached data marked "shared" is always automatically up to date. All processors and bus masters in that sharing domain see the exact same value. 

Extending hardware coherency to the system requires a coherent bus protocol, and in 2011 Arm released the AMBA 4 ACE specification which introduced the AXI Coherency Extensions on top of the popular AXI protocol. The full ACE interface allows hardware coherency between processor clusters and allows an SMP operating system to extend to more cores. With the example of two clusters, any shared access to memory can snoop into the other cluster’s caches to see if the data is already on chip. If not, it is fetched from external memory (DDR). The AMBA 4 ACE-Lite interface is designed for IO (or one-way) coherent system masters like DMA engines, network interfaces and GPUs. These devices might not have any caches of their own, but they can read shared data from the ACE processors. Alternatively, they might have caches but not cache shareable data. While hardware coherency can add some complexity to the interconnect and processors, it massively simplifies the software and enables applications that would not be possible with software coherency. An example being big.LITTLE Global Task Scheduling.


Get support

Community Forums

Answered Memory footprint of 64-bit ARM compared to x86-64
  • thumb-2
  • GNU Compiler Collection (GCC)
  • 64-bit
0 votes 2362 views 2 replies Latest 18 days ago by lovelmark Answer this
Answered FreeRTOS OS awareness in Debug configuration.
  • Arm Development Studio
0 votes 603 views 6 replies Latest 25 days ago by Deepak Answer this
Answered How to download Mali opencl SDK? 0 votes 2194 views 3 replies Latest 26 days ago by Ronan Synnott Answer this
Answered Composite USB device MSC+CDC does not work 0 votes 290 views 1 replies Latest 29 days ago by Jan van de Kamer Answer this
Answered Flash Programming failed error on LPC1519 while debugging a simple C code. 1 votes 403 views 3 replies Latest 1 months ago by Arron Hartley Answer this
Answered STM32F411RE: cannot use #include math_arm.h for CMSIS
  • STM32F4DISCOVERY
  • Digital Signal Processor (DSP)
  • CMSIS
1 votes 741 views 3 replies Latest 1 months ago by lutfisan Answer this
Answered Memory footprint of 64-bit ARM compared to x86-64 Latest 18 days ago by lovelmark 2 replies 2362 views
Answered FreeRTOS OS awareness in Debug configuration. Latest 25 days ago by Deepak 6 replies 603 views
Answered How to download Mali opencl SDK? Latest 26 days ago by Ronan Synnott 3 replies 2194 views
Answered Composite USB device MSC+CDC does not work Latest 29 days ago by Jan van de Kamer 1 replies 290 views
Answered Flash Programming failed error on LPC1519 while debugging a simple C code. Latest 1 months ago by Arron Hartley 3 replies 403 views
Answered STM32F411RE: cannot use #include math_arm.h for CMSIS Latest 1 months ago by lutfisan 3 replies 741 views