Memory system implementation
This section describes the implementation of the L1 memory system.
Limited Order Regions
The Cortex®-A55 core supports a single limited order range that includes the entire memory space.
The Cortex-A55 core supports the atomic instructions added in the Arm®v8.1‑A architecture.
Atomic instructions to cacheable memory can be performed as either near atomics or far atomics, depending on where the cache line containing the data resides. If the instruction hits in the L1 data cache in a unique state then it will be performed as a near atomic in the L1 memory system. If the atomic operation misses in the L1 cache, or the line is shared with another core then the atomic is sent as a far atomic out to the L3 cache. If the operation misses everywhere within the cluster, and the master interface is configured as CHI, and the interconnect supports far atomics, then the atomic will be passed on to the interconnect to perform the operation. If the operation hits anywhere inside the cluster, or the interconnect does not support atomics, then the L3 memory system will perform the atomic operation and allocate the line into the L3 cache if it is not already there.
The Cortex-A55 core supports atomics to device or non-cacheable memory, however this relies on the interconnect also supporting atomics. If such an atomic instruction is executed when the interconnect does not support them, it will result in a synchronous Data Abort (for load atomics) or an asynchronous Data Abort (for store atomics). The behavior of the atomic instructions can be modified by the CPUECTLR register settings.
The core supports Load acquire instructions adhering to the RCpc consistency
semantic introduced in the Armv8.3‑A extensions. This is reflected in register ID_AA64ISAR1_EL1
where bits[23:20] are set to
to indicate that the core supports
instructions implemented in AArch64.
Transient memory region
The core has a specific behavior for memory regions that are marked as Write-Back cacheable and transient, as defined in the Armv8‑A architecture.
For any load that is targeted at a memory region that is marked as transient, the following occurs:
- If the memory access misses in the L1 data cache, the returned cache line is allocated in the L1 data cache but is marked as transient.
- On eviction, if the line is clean and marked as transient, it is not allocated into the L2 cache but is marked as invalid.
For streams of contiguous stores that are targeted at a memory region that is marked as transient, the following occurs:
- If a full cache line is written and misses in the L1 data cache, the complete cache line is streamed to the external memory system without being allocated into the L1 data cache, the L2 cache or, if present, in the L3 cache.
Non-temporal loads indicate to the caches that the data is likely to be
used for only short periods. For example, when streaming single-use read data that is then
discarded. In addition to non-temporal loads, there are also prefetch-memory (
instructions with the
Non-temporal loads cause allocation into the L1 data cache, with the same performance as normal loads. However, when a later linefill is allocated into the cache, the cacheline marked as non-temporal has higher priority to be be replaced. To prevent pollution of the L2 cache, a non-temporal line that is evicted from L1, is not allocated to L2 as would happen for a normal line.
NoteThe line is only marked as non-temporal in the cache if the core has the line in a unique state. If shared with other cores, the line is treated normally.
Non-temporal stores are treated as if write streaming mode was active. They are not allocated into any cache in the cluster unless they hit in the cache.