Coherency means ensuring that all processors or bus masters within a system have the same view of shared memory. It means that changes to data held in the cache of one core are visible to the other cores, making it impossible for cores to see stale or old copies of data. This can be handled by simply not caching, that is disabling caches for shared memory locations, but this typically has a high performance cost.
- Software managed coherency
Software managed coherency is a more common way to handle data sharing. Data is cached, but software, usually device drivers, must clean dirty data or invalidate old data from caches. This takes time, adds to software complexity, and can reduce performance when there are high rates of sharing
- Hardware managed coherency
Hardware maintains coherency between level 1 data caches within a cluster. A core automatically participates in the coherency scheme when it is powered up, has its D-cache and MMU enabled, and an address is marked as coherent. However, this cache coherency logic does NOT maintain coherency between data and instruction caches.
In the ARMv8-A architecture and associated implementations, there are likely to be hardware managed coherent schemes. These ensure that any data marked as shareable in a hardware coherent system has the same value seen by all cores and bus masters in that shareability domain. This adds some hardware complexity to the interconnect and to clusters, but greatly simplifies the software and enables applications that would otherwise not be possible using only software coherency.
There are a number of standard ways in which cache coherency schemes can operate. The ARMv8 processors use the MOESI protocol. ARMv8 processors can also be connected to AMBA 5 CHI interconnects, for which the cache coherency protocol is similar to (but not identical to) MOESI.
Depending on which protocol is in use, the SCU marks each line in the cache with one of the following attributes: M (Modified), O (Owned), E (Exclusive), S (Shared) or I (Invalid). These are described below:
The most up-to-date version of the cache line is within this cache. No other copies of the memory location exist within other caches. The contents of the cache line are no longer coherent with main memory.
This describes a line that is dirty and in possibly more than one cache. A cache line in the owned state holds the most recent, correct copy of the data. Only one core can hold the data in the owned state. The other cores can hold the data in the shared state.
The cache line is present in this cache and coherent with main memory. No other copies of the memory location exist within other caches.
The cache line is present in this cache and is not necessarily coherent with memory, given that the definition of Owned allows for a dirty line to be duplicated into shared lines. It will, however, have the most recent version of the data. Copies of it can also exist in other caches in the coherency scheme.
The cache line is invalid.
The following rules apply for the standard implementation of the protocol:
A write can only be performed if the cache line is in a Modified or Exclusive state. If it is in a Shared state, all other cached copies must be invalidated first. A write moves the line into a Modified state.
A cache can discard a shared line at any time, changing it to an Invalid state. A Modified line is written back first.
If a cache holds a line in a Modified state, reads from other caches in the system receive the updated data from the cache. Conventionally, this is achieved by first writing the data to main memory and then changing the cache line to a Shared state, before performing a read.
A cache that has a line in an Exclusive state must move the line to a Shared state when another cache reads that line.
A Shared state might not be precise. If one cache discards a Shared line, another cache might not be aware that it can now move the line to an Exclusive state.
The processor cluster contains a Snoop Control Unit (SCU) that contains duplicate copies of the tags stored in the individual L1 Data Caches. The cache coherency logic therefore:
Maintains coherency between L1 data caches.
Arbitrates accesses to L2 interfaces, for both instructions and data.
Has duplicated Tag RAMs to keep track of what data is allocated in each core’s data.
Each core in Figure 14.4 has its own data and instruction cache. The cache coherency logic contains a local copy of the tags from the D-caches. However, the instruction caches don’t take part in coherency. There is 2-way communication between data cache and coherency logic.
ARM multi-core processors also implement optimizations that can copy clean data and move dirty data directly between participating L1 caches, without having to access and wait for external memory. This activity is handled in multi-core systems by the SCU.
Key facets of the multi-core technology are: