A-profile overview

Go to section:

Armv9-A | Armv8-A

The Arm Application-profile (A-profile) architecture targets high-performance markets, such as PC, mobile, gaming, and enterprise. The latest versions of the A-profile architecture are Armv9-A and Armv8-A. View the following table for the features comparison of Armv9-A and Armv8-A.

Feature
Architecture versions
Description
More information
AArch64
Armv8.0-A
Armv9.0-A

AArch64 is the 64-bit execution environment for the Arm architecture.

AArch64 provides:

  • Large physical and virtual address spaces
  • Large register file or 64-bit registers
  • Automatic signalling of events power-efficient, high-performance spinlocks
  • Efficient cache management
  • Load-Acquire, Store-Release instructions designed for C++11, C11, Java memory models.
  • 64-bit execution environment for the Arm architecture.
Learn the architecture: Guides for A-profile
AArch32
Armv8.0-A
Armv9.0-A (EL0 only)
The 32-bit execution environment for the Arm architecture. Provides compatilibility with Armv7-A and earlier.
Learn the architecture: Guides for A-profile
Virtualization
Armv8.0-A
Armv9.0-A
Support for hypervisors and virtualization
Learn the architecture: AArch64 Virtualization
TrustZone
Armv8.0-A
Armv9.0-A
TrustZone offers an efficient, system-wide approach to security with hardware-enforced isolation built into the CPU.
Learn the architecture: TrustZone for AArch64
Realm Management Extension (RME) Armv9-A The Realm Management Extension (RME) builds on TrustZone, with the following features:
  • Two additional security states
  • Two additional physical address spaces 
  • The ability to dynamically move resources between security states
These features enable the Arm Confidential Compute Architecture (Arm CCA) and Dynamic TrustZone.
Arm Confidential Compute Architecture
Hardware-accelerated cryptography
Armv8.0-8.2-A
Armv9.0-A
Provides 3× to 10× better software encryption performance. This is useful for small granule decryption and encryption that is too small to offload to a hardware accelerator efficiently, for example https.
Learn the architecture: AArch64 Instruction Set Architecture
Neon
Armv8.0-A
Armv9.0-A
Neon technology is a packed SIMD architecture. Neon registers are considered as vectors of elements of the same data type, with Neon instructions operating on multiple elements simultaneously. Multiple data types are supported by the technology, including floating-point and integer operations.
Neon programmer's guides for Armv8-A
Virtualization Host Extension (VHE)
Armv8.1-A
Armv9.0-A
These enhancements improve the performance of Type 2 hypervisors by reducing the software overhead associated when transitioning between the Host and Guest operating systems. The extensions allow the Host OS to execute at EL2, as opposed to EL1, without substantial modification.
Learn the architecture: AArch64 Virtualization
Privilege Access Never (PAN)
Armv8.1-A
Armv9.0-A
PAN allows kernels to prevent access to unprivileged locations, providing increased robustness.
Learn the architecture: AArch64 memory model
Statistical Profiling Extension (SPE)
Armv8.2-A
Armv9.0-A
A sample criterion is set on an instruction or micro-operation basis, and then sampled at regular intervals. Each sample then gathers context associated with that sample into a profiling record, with only one record ever being compiled at any given time. Analyzing large working sets of samples can provide considerable insight into software execution and its associated performance when sampling continuously on systems running large workloads over extended periods of time.
Statistical Profiling Extension for Armv8-A
Scalable Vector Extensions (SVE)
Armv8.2-A SVE provides support for SIMD with variable vector lengths. SVE enables vector length agnostic coding style, where the code does not need to be re-written or re-compiled, since it dynamically adapts to the implemented vector length. The SVE architecture allows implementations with a vector length up to 2048-bits, where vector length must be a multiple of 128-bits. SVE also supports code written for a fixed vector length.
SVE programming examples
Pointer authentication
Armv8.3-A
Armv9.0-A
Computer attacks are becoming more sophisticated. Examples of this are exploit mechanisms, such as the use of gadgets in Return-Orientated Programming (ROP) and Jump-Orientated Programming (JOP). To mitigate against such exploits, Armv8.3-A introduces a feature that authenticates the contents of a register before it is used as the address for an indirect branch or data reference. For address authentication, the functionality uses the upper bits in a 64-bit address value normally associated with signed extension of the address space. This allows the introduction of a Pointer Authentication Code (PAC) as a new field within the upper bits of the value.
Code reuse attacks: the compiler story
Nested Virtualization
Armv8.3-A  
Armv8.4-A
Armv9.0-A
There is growing interest in cloud computing and particular interest in an increasingly common use case, where a user rents a virtual machine from an infrastructure as a service (IaaS) provider. Nested virtualization is an attractive proposition, where the workload intended to run on this virtual machine includes the use of a hypervisor.
Learn the architecture: AArch64 Virtualization
Memory Tagging Extension  (MTE)
Armv8.5-A
Armv9.0-A
Memory tagging enables developers to identify memory safety violations in their programs.
Memory Tagging Extension: Enhancing memory safety through architecture

 

Memory Tagging Extension Whitepaper
Branch Target Identification (BTI)
Armv8.5-A
Armv9.0-A
BTI allows software to identify valid targets for in-direct branches.  BTI complements the support for Pointer authentication, providing a defence against JOP techniques.
Code reuse attacks: the compiler story
GEneral Matrix Multiply (GEMM)
Armv8.6-A
Armv9.1-A
Adds new Advanced SIMD (Neon) and SVE instructions to accelerate matrix operations, greatly reducing the number of memory accesses required. Developments in the Arm A-Profile Architecture: Armv8.6-A
BFloat16
Armv8.6-A
Armv9.1-A
Support in Advanced SIMD (Neon) and SVE for BFloat16 data type. BF16 has recently emerged as a format tailored specifically to high-performance processing of Neural Networks.
BFloat16 processing for Neural Networks on Armv8-A
High precision timers
Armv8.6-A
Armv9.1-A
The Generic Timer frequency is increased to a new standard of 1GHz.
Arm A-Profile architecture developments 2018: Armv8.5-A
64-byte load and stores
Armv8.7-A
Armv9.2-A
A growing trend in enterprise systems is the introduction of accelerators that can be accessed using 64-byte atomic loads or stores. These are used to add items to queues and can, in some cases, signal success or failure of the enqueue operation.
Arm A-Profile architecture developments 2020
Scalable Vector Extension v2 (SVE2)
Armv9.0-A
The SVE2 is a superset of the Armv8-A SVE, with expanded functionality. The SVE2 instruction set adds thorough fixed-point arithmetic support.
Arm A-Profile architecture developments 2020
Transactional Memory Extension (TME)
Armv9.0-A
The Transactional Memory Extension brings Hardware Transactional Memory (HTM) support to the Arm architecture. Transactional Memory is used to address the difficulty of writing highly concurrent, multi-threaded programs in which the amount of coarse-grain, thread-level parallelism can scale better with the number of CPUs, by reducing serialization due to lock contention.
New technologies for the Arm A-profile architecture
Branch Record Buffer Extensions (BRBE)
Armv9.2-A
Branch Record Buffer Extensions (BRBE) captures a recent sequence of branches in an easily consumable format. This information can be used for debugging or fed into profiling tools for hot-spot analysis and AutoFDO.
Available Q3 - Q4 2021

For information on Arm IP implementation of the architecture features, view Arm Cortex-A processors.

Armv9-A architecture

The Armv9-A architecture builds on and is backwards compatible with the Armv8-A architecture. The Armv9-A architecture forms the foundation for the Arm Base System Architecture – a specification outlining a standard that ensures hardware and firmware compatibility across a wide range of applications at the system level.

The Armv9-A architecture introduces some major new features:

  • SVE2: extending the benefit of scalable vectors to many more use cases
  • Realm Management Extension (RME): extending Confidential Compute on Arm platforms to all developers. Read more about Confidential Compute and Arm architecture security features
  • BRBE: providing profiling information, such as Auto FDO
  • Embedded Trace Extension (ETE) and Trace Buffer Extension (TRBE): enhanced trace capabilities for Armv9
  • TME: hardware transactional memory support for the Arm architecture

Armv8-A architecture

The Armv8-A architecture introduces the ability to use 64-bit and 32-bit Execution states, known as AArch64 and AArch32 respectively. The AArch64 Execution state supports the A64 instruction set. It holds addresses in 64-bit registers and allows instructions in the base instruction set to use 64-bit registers for their processing. The AArch32 Execution state is a 32-bit Execution state that preserves backwards compatibility with the Armv7-A architecture, enhancing that profile so that it can support some features included in the AArch64 state. It supports the T32 and A32 instruction sets.

Armv8-A architecture allows different levels of AArch64 and AArch32 support, for example:

  • AArch64 only designs
  • AArch64 designs that also support AArch32 operating systems and virtual machines
  • AArch64 support with AArch32 at (unprivileged) application level only

Armv7-A architecture

The Armv7-A architecture introduces the concept of architecture profiles, a concept that continues in Armv8-A and Armv9-A. The Armv7-A architecture:

  • Implements a traditional Arm architecture with multiple modes
  • Supports a Virtual Memory System Architecture (VMSA) based on a Memory Management Unit (MMU)
  • Supports the Arm (A32) and Thumb (T32) instruction sets

This architecture also supports multiple extensions:

  • Security Extensions
  • Multiprocessing Extensions
  • Large Physical Address Extension
  • Virtualization Extensions
  • Generic Timer Extension
  • Performance Monitors Extension

All of these extensions are optional and most of the functionality they provide is included in the Armv8-A architecture.

Community

Blogs

Forums

Answered How to set up stage-2 translation table
  • Armv8-A
0 votes 9199 views 3 replies Latest yesterday by Accuser Answer this
Answered Cortex-m0 interrupt_demo simulation issue
  • Cortex-M0
  • Simulation
  • GPIO
  • Microcontroller
  • DesignStart
  • Cortex-M
  • System Design
  • Interrupt
0 votes 4467 views 9 replies Latest yesterday by Joseph Yiu Answer this
Answered Helium intrinsic vdupq_x_n_f32
  • Cortex-M55
  • Arm Development Studio
  • Helium
  • Corstone SSE-300
0 votes 1811 views 4 replies Latest 3 days ago by Yevhenii Prilipukhov Answer this
Answered Boot from BL2->BL31->Linux 0 votes 610 views 4 replies Latest 3 days ago by BenLim Answer this
Answered Cortex-M33 MTB configuration - when the MTB buffer is full
  • TrustZone for Armv8-M
  • Cortex-M33
  • CoreSight Micro Trace Buffer for the Cortex-M33
0 votes 1716 views 3 replies Latest 3 days ago by Mintancy Answer this
Answered hi iam trying to find FFT using stm32F4 discovery board. i have tried using CFFT and RFFT available in arm CMSIS DSP library but iam getting wrong answers. Can someone please tell me where iam wrong. Iam posting the code below for both RFFT and CFFT ? 0 votes 585 views 3 replies Latest 11 days ago by Oliver Beirne Answer this
Answered How to set up stage-2 translation table Latest yesterday by Accuser 3 replies 9199 views
Answered Cortex-m0 interrupt_demo simulation issue Latest yesterday by Joseph Yiu 9 replies 4467 views
Answered Helium intrinsic vdupq_x_n_f32 Latest 3 days ago by Yevhenii Prilipukhov 4 replies 1811 views
Answered Boot from BL2->BL31->Linux Latest 3 days ago by BenLim 4 replies 610 views
Answered Cortex-M33 MTB configuration - when the MTB buffer is full Latest 3 days ago by Mintancy 3 replies 1716 views
Answered hi iam trying to find FFT using stm32F4 discovery board. i have tried using CFFT and RFFT available in arm CMSIS DSP library but iam getting wrong answers. Can someone please tell me where iam wrong. Iam posting the code below for both RFFT and CFFT ? Latest 11 days ago by Oliver Beirne 3 replies 585 views