A-Profile architectures

Arm produces a whole family of processors that share common instruction sets and programmer’s models and have some degree of backward compatibility. The Architecture (‘A’) profile targets high performance markets such as mobile and enterprise.

Armv8-A Architecture

The Armv8-A architecture is the latest generation Arm architecture targeted at the Applications ('A') profile. 

It introduces the ability to use 64-bit and 32-bit Execution states, known as AArch64 and AArch32 respectively. The AArch64 Execution state supports the A64 instruction set, holds addresses in 64-bit registers and allows instructions in the base instruction set to use 64-bit registers for their processing. The AArch32 Execution state is a 32-bit Execution state that preserves backwards compatibility with the Armv7-A architecture and enhances that profile so that it can support some features included in the AArch64 state. It supports the T32 and A32 instruction sets.

Armv8-A is the only profile that supports AArch64 execution, where the relationship between AArch64 and AArch32 is known as interprocessing. In addition, the Armv8-A architecture allows different levels of AArch64 and AArch32 support, for example:
  • AArch64 only designs.
  • AArch64 designs that also support AArch32 operating systems/virtual machines.
  • AArch64 support with AArch32 at (unprivileged) application level only.

What's new for Engineers in Armv8-A?

The Armv8-A architecture introduces a number of changes, which enable significantly higher performance processor implementations to be designed.

Large physical address

This enables the processor to access beyond 4GB of physical memory.

64-bit virtual addressing

This enables virtual memory beyond the 4GB limit. This is important for modern desktop and server software using memory mapped file I/O or sparse addressing.

Automatic event signaling

This enables power-efficient, high-performance spinlocks.

Larger register files

Thirty-one 64-bit general-purpose registers increase performance and reduce stack use.

Efficient 64-bit immediate generation

There is less need for literal pools.

Large PC-relative addressing range

A +/-4GB addressing range for efficient data addressing within shared libraries and position-independent executables.

Additional 16KB and 64KB translation granules

This reduces Translation Lookaside Buffer (TLB) miss rates and depth of page walks.

New exception model

This reduces OS and hypervisor software complexity.

Efficient cache management

User space cache operations improve dynamic code generation efficiency. Fast Data cache clear using a Data Cache Zero instruction.

Hardware-accelerated cryptography

Provides 3× to 10× better software encryption performance. This is useful for small granule decryption and encryption too small to offload to a hardware accelerator efficiently, for example https.

Load-Acquire, Store-Release instructions

Designed for C++11, C11, Java memory models. They improve performance of thread-safe code by eliminating explicit memory barrier instructions.

NEON double-precision floating-point advanced SIMD

This enables SIMD vectorization to be applied to a much wider set of algorithms, for example, scientific computing, High Performance Computing (HPC) and supercomputers.

Armv7-A Architecture

The Armv7-A architecture introduced the concept of architecture profiles which has been carried onwards into the Armv8 architectures. It implements a traditional Arm architecture with multiple modes, supports a Virtual Memory System Architecture (VMSA) based on a Memory Management Unit (MMU), and supports the Arm (A32) and Thumb (T32) instruction sets.

Architectural extensions

This architecture also supports multiple extensions. These are:

  • Security Extensions. These are an optional set of extensions that provide a set of security features that facilitate the development of secure applications.
  • Multiprocessing Extensions. These are an optional set of extensions that provide a set of features that enhance multiprocessing functionality.
  • Large Physical Address Extension. This is an optional extension that provides an address translations system supporting physical addresses of up to 40 bits at a fine grain of translation. It requires implementation of the Multiprocessing Extensions.
  • Virtualization Extensions. These are an optional set of extensions that provide hardware support for a virtual machine monitor, called a Hypervisor, to switch between Guest operating systems. It requires implementation of the Security Extensions and the Large Physical Address Extension.
  • Generic Timer Extension. This is an optional extension that provides a system timer and a low-latency register interface to it. It is included as part of the Large Physical Address Extension or the Virtualization Extension, but can also been implemented with earlier versions of the Armv7-A architecture.
  • Performance Monitors Extension. This extension defines a recommended performance monitors implementation and reserves register space for performance monitors.

Most of the functionality provided by these extensions is included in the Armv8-A architecture, though the Performance Monitors Extension remains optional.