The cluster consists of:
- One to eight cores.
- The DynamIQ™ Shared Unit (DSU), which connects the cores to an external memory system.
For more information, see the Arm® DynamIQ™ Shared Unit Technical Reference Manual.
The following figure includes a top-level functional diagram of a core.
Figure A2-1 Core block diagram
Instruction Fetch Unit (IFU)
The IFU fetches instructions from the instruction cache or from external memory and predicts the outcome of branches in the instruction stream. It passes the instructions to the Data Processing Unit (DPU) for processing.
Data Processing Unit (DPU)
The DPU decodes and executes instructions. It executes instructions that require data transfer to or from the memory system by interfacing to the Data Cache Unit (DCU). The DPU includes the PMU, the Advanced SIMD and floating-point support, and the Cryptographic Extension.
The PMU provides six performance monitors that can be configured to gather statistics on the operation of each core and the memory system. The information can be used for debug and code profiling.
- Advanced SIMD and floating-point support
Advanced SIMD is a media and signal processing architecture that adds instructions primarily for audio, video, 3D graphics, image and speech processing. The floating-point architecture provides support for single-precision and double-precision floating-point operations.
All scalar floating-point instructions are available in the A64 instruction set.
All VFP instructions are available in the A32 and T32 instruction sets.
The A64 instruction set offers additional Advanced SIMD instructions, including double-precision floating-point vector operations.
NoteThe Advanced SIMD architecture, its associated implementations, and supporting software, are also referred to as NEON™ technology.
- Cryptographic Extension
- The optional Cortex®-A55 core
Cryptographic Extension supports the Armv8‑A
Cryptographic Extension. It is a configuration option that can be set when
configuring and integrating the core into a system and applies to all cores.
The Cryptographic Extension adds new instructions to Advanced SIMD that
- Advanced Encryption Standard (AES) encryption and decryption.
- The Secure Hash Algorithm (SHA) functions SHA-1, SHA-224, and SHA-256.
- Finite field arithmetic used in algorithms such as Galois/Counter Mode and Elliptic Curve Cryptography.
Memory Management Unit (MMU)
The MMU provides fine-grained memory system control through a set of virtual-to-physical address mappings and memory attributes that are held in translation tables. These are saved into the Translation Lookaside Buffer (TLB) when an address is translated. The TLB entries include global and Address Space Identifiers (ASIDs) to prevent context switch TLB flushes. They also include Virtual Machine Identifiers (VMIDs) to prevent TLB flushes on virtual machine switches by the hypervisor.
- The first level of caching for the translation table information is an L1 TLB. It is implemented on each of the instruction and data sides. All TLB-related maintenance operations result in flushing both the instruction and data L1 TLBs.
- L2 TLB
A unified L2 TLB handles the misses from the L1 TLBs.
In implementations with core cache protection, parity bits protect the TLB RAMs by enabling the detection of any single-bit error. If an error is detected, the entry is invalidated and fetched again.
L1 memory system
The L1 memory system includes the DCU, the Store Buffer (STB), and the Bus Interface Unit (BIU).
The DCU manages all load and store operations.
The L1 data cache RAMs are protected using Error Correction Codes (ECC). The ECC scheme is Single Error Correct Double Error Detect (SECDED). The DCU includes a combined local and global exclusive monitor that is used by Load-Exclusive and Store-Exclusive instructions.
The STB holds store operations when they have left the load/store pipeline in the DCU and have been committed by the DPU. The STB can request access to the L1 data cache, initiate linefills, or write to L2 and L3 memory systems.
The STB is also used to queue maintenance operations before they are broadcast to other cores in the cluster.
- The BIU contains the interface to the L2 memory system and buffers to decouple the interface from the L1 data cache and STB.
L2 Memory System
The L2 memory system contains the L2 cache. The L2 cache is optional and private to each core. The L2 cache is 4-way set associative, supports 64-byte cache lines, and has a configurable cache RAM size between 64KB and 256KB. The L2 memory system is connected to the DynamIQ Shared Unit through an optional asynchronous bridge.
GIC CPU interface
The GIC CPU Interface, when integrated with an external distributor component, is a resource for supporting and managing interrupts in a cluster system.
DynamIQ™ Shared Unit
The DynamIQ Shared Unit contains the L3 cache and logic required to maintain coherence between the cores in the cluster. For more information, see the Arm® DynamIQ™ Shared Unit Technical Reference Manual
Debug and trace components
The Cortex-A55 core supports a range of debug, test, and trace options including:
- Six performance event counters, provided by the PMU, and one cycle counter.
- Six hardware breakpoints, and four watchpoints.
- Per-core instruction trace only ETM.
- Per-core support for an ELA-500.
- AMBA 4 APB interfaces between the cluster and the Debug Block.
Details of the core-specific debug elements can be found in this document. For information on the cluster debug and trace components supported by the Cortex-A55 core, see the Arm® DynamIQ™ Shared Unit Technical Reference Manual.