Cortex-R82 is the highest performance real-time processor from Arm and the first to implement the Armv8-R AArch64 architecture.
The Cortex-R82 processor supports Arm Neon technology, an advanced Single Instruction Multiple Data (SIMD) architecture extension. Neon can accelerate signal processing algorithms and functions to speed up applications, such as machine learning.
Cortex-R82 is an in-order, superscalar multicore processor that can address up to 1TB of DRAM volatile memory to fulfil the requirements of emerging memory technologies. The optional Memory Management Unit (MMU) enables rich OSes, such as Linux, supported by a wide and vibrant ecosystem offering a software stack and development tools.
The Cortex-R82 processor delivers the higher compute performance needed for complex data storage applications, including Computational Storage Drives (CSDs). Implementing the Cortex-R82 processor in a Solid State Drive (SSD) based on NVMe or NVMe-oF specifications or CXL architecture enables efficient parallel computation at the data itself.
Arm has a suite of technologies and tools to support and optimize development of Cortex-R82 based storage controllers. Arm Development Studio and Fast Models enable early hardware and software co-development and Cycle Models allow custom benchmarking and performance optimization ahead of silicon availability. Arm training and design review services and Cortex-R82 POP IP and libraries accelerate time-to-market and reduce risk.
Compliant with Armv8.4-A extensions
|Instruction Set||A64 instruction set|
|Microarchitecture||Eight-stage, in-order, superscalar pipeline with direct and indirect branch prediction.|
|Cache controllers||Separate L1 data cache and L1 instruction cache private to each core.
An optional, shared (between all cores), and unified (instructions and data) L2 cache.
Partial L2 cache power-down support.
|Tightly-Coupled Memories (TCM)||Two optional TCMs private to each core: an ITCM for instructions and literal pool data and a DTCM for data.|
|Cache protection||Reliability, Availability, and Serviceability (RAS) extension.
Optional Error Correcting Code (ECC), Single Error Correct Double Error Detect (SECDED) or Double Error Detect (DED) protection for all of the instantiated cache tag and data RAMs, the TCM RAMs, and the TLB RAMs.
|Interrupt interface||Standard interrupt, IRQ, FIQ, inputs are provided together with an interface to an external GICv3.2-compliant Generic Interrupt Controller (GIC) supporting complex priority-based interrupt handling. The processor includes low-latency interrupt technology that allows long multicycle instructions to be interrupted and restarted. Deferral of lengthy memory accesses occurs in certain circumstances.|
|Memory Protection Unit (MPU)||Two optional and programmable MPUs controlled from EL1 and EL2 respectively.
Configure attributes for up to 32 regions per MPU. Regions cannot overlap.
|Memory Management Unit (MMU)||Optional EL1 MMU for fine-grained memory system control through virtual-to-physical address mappings and memory attributes held in translation tables.
|Floating Point Unit (FPU) and Advanced SIMD (Neon)||Optional FPU implementing the Arm Vector and Floating Point architecture VFPv4 with 32 x 128-bit registers, compliant with IEEE754. There is support for:
|Master bus||Shared Main Master (MM) port implemented as AXI5 256-bit providing access for instructions, data, and peripherals. This interface can optionally be a 256-bit CHI-E interface.|
|Slave bus||128-bit shared AXI-S port used for two purposes:
|Low Latency RAM Port (LLRAM)||Optional AXI5 256-bit shared LLRAM port providing low-latency access for instructions and data. The port is designed to connect to local memory. This local memory provides many of the benefits of TCM and in addition can be slower and lower power and also easily shared between the up-to-eight processor cores.|
|Shared Peripheral Port (SPP)||Optional AXI5 64-bit SPP for providing access to peripherals.|
|Low Latency Peripheral Port (LLPP)||An optional per core dedicated 32-bit AXI5 port to integrate latency-sensitive peripherals tightly with a specific core within the processor.|
|Main Accelerator Coherency Port (MACP)||ACE5-Lite 128-bit shared slave MACP for external access to MM address ranges. MACP enables I/O coherency for external agents with the per-core L1 data cache and shared L2 cache.|
|Up to eight cores||With in-cluster hardware coherency.|
|Debug||Debug Access Port is provided. Its functionality can be extended using Coresight Debug and Trace.
|Trace||Cortex-R82 includes one CoreSight Embedded Trace Module per core.|
Processor area, frequency, and power consumption are highly dependent on process, libraries, and optimizations. The following characteristics table estimates a typical four-core cluster implementation of the Cortex-R82 processor on mainstream low-power process technology (5nm) with standard-performance cell libraries. Each core is configured with 32KB L1 instruction cache, 32KB L1 data cache, 32KB of ITCM, 32KB of DTCM and full Advanced SIMD and floating-point engine. The processor cluster is configured with an integrated 1MB L2 shared cache.
|Cortex-R82 four-core cluster||5nm
|Maximum clock frequency
||3.41 / 4.32 / 8.67 DMIPS/MHz*
|Total area (including Cluster, Cores, RAM and routing)
||From 30 DMIPS/mW***
*Benchmark built with GCC 9.2. The first result abides by all of the 'ground rules' laid out in the Dhrystone documentation, the second permits inlining of functions (not just the permitted C string libraries) while the third additionally permits link time optimizations. All are with the version 2.1 of Dhrystone and ANSI-C-style function declarations.
**Benchmark built with Green Hills Software Compiler 2020.1.4 using “-Ospeed -Omax -OI -OB -OV” between others.
*** Preliminary estimates, subject to be refined once the product is released.