Translation tables in ARMv8-A
ARMv8-A supports three different sets of translation table format:
- The ARMv8-A AArch64 Long Descriptor format.
- The ARMv7-A Long Descriptor format such as the Large Physical Address Extension (LPAE) to the ARMv7-A architecture, found in, for example, the ARM Cortex-A15 processor.
- The ARMv7-A Short Descriptor format.
In AArch32 state, the existing ARMv7-A long and short descriptor formats can be used to run existing guest operating systems and existing application code without modification. The ARMv7-A short descriptors can only be used at EL0 and EL1 stage 1 translations. They cannot be used by hypervisors or Secure monitor code.
The ARMv8-A long descriptor format is always used in AArch64 Execution state. This is similar to the ARMv7-A long descriptor format with Large Physical Address Extensions. It uses the same 64-bit long-descriptor format, but with some changes. It introduces a Level 0 table index, which uses the same descriptor format as the level 1 table. There is added support for up to 48-bit input and output addresses. The input virtual address is now 64-bit.
However, as the architecture does not support full 64-bit addressing, bits [63:48] of the address must all be the same, that is, all 0s or all 1s, or the top 8 bits can be used for VA tagging.
AArch64 supports three different translation granules. These define the block size at the lowest level of translation table and control the size of translation tables in use. Larger granule sizes reduce the number of levels of translation table required and this can become an important consideration in systems using a hypervisor to provide virtualization.
The supported granule sizes are 4KB, 16KB, and 64KB, and it is IMPLEMENTATION DEFINED which of the three are supported. Code that creates translation tables is able to read the Memory Model Feature Register 0 system register (ID_AA64MMFR0_EL1), to find out which are the supported sizes. The size is configurable for each translation table within the Translation Control Register (TCR_EL1).
AArch64 descriptor format
The AArch64 descriptor format is used in all levels of table, from Level 0 to Level 3. Level 0 descriptors can only output the address of a Level 1 table. Level 3 descriptors cannot point to another table and can only output block addresses. The format of the table is therefore slightly different for Level 3.
The following figure shows that the table descriptor type is identified by bits 1:0 of the entry and can refer to either:
- The address of a next level table, in which case memory can be further subdivided into smaller blocks.
- The address of a variable sized block of memory.
- Table entries, which can be marked Fault, or Invalid.
For clarity, this diagram does not specify the width of bit fields.
Effect of granule sizes on translation tables
The three different granule sizes can affect the number and size of translation tables required.
- In all cases, the first level of table is omitted if the VA input range is restricted to 39 bits.
- Depending on the size of the possible VA range, there can be even fewer levels. With a 4KB granule, for example, if the TTBCR is set so that low addresses span only 1GB, then levels 0 and 1 are not required and the translation starts at level 2, going down to level 3 for 4KB pages.
In the case of a 4kB granule, the hardware can use a 4-level look up process. The 48-bit address has nine address bits for each level translated (that is, 512 entries each), with the final 12 bits selecting a byte within the 4kB coming directly from the original address.
Bits [47:39] of the virtual address index into the 512 entry L0 table. Each of these table entries spans a 512GB range and points to an L1 table. Within that 512 entry L1 table, bits [38:30] are used as index to select an entry and each entry points to either a 1GB block or an L2 table.
Bits [29:21] index into a 512 entry L2 table and each entry points to a 2MB block or next table level. At the last level, bits [20:12] index into a 512 entry L2 table and each entry points to a 4kB block.
In the case of a 16kB granule, the hardware can use a 4-level look up process.
The 48-bit address has 11 address bits per level that is translated, that is 2048 entries each, with the final 14 bits selecting a byte within the 4kB coming directly from the original address.
The level 0 table contains only two entries. Bit  of the virtual address selects a descriptor from the two entry L0 table. Each of these table entries spans a 128TB range and points to an L1 table. Within that 2048 entry L1 table, bits [46:36] are used as an index to select an entry and each entry points to an L2 table. Bits [35:25] index into a 2048 entry L2 table and each entry points to a 32MB block or next table level.
At the final translation stage, bits [24:14] index into a 2048 entry L2 table and each entry points to a 16kB block.
In the case of a 64kB granule, the hardware can use a 3-level look up process. The level 1 table contains only 64 entries. Bits [47:42] of the virtual address select a descriptor from the 64 entry L1 table. Each of these table entries spans a 4TB range and points to an L2 table. Within that 8192 entry L2 table, bits [41:29] are used as index to select an entry and each entry points to either a 512MB block or an L2 table. At the final translation stage, bits [28:16] index into an 8192 entry L3 table and each entry points to a 64kB block.
The MMU uses translation tables and translation registers to control which memory locations are Cacheable. The MMU controls the cache policy, memory attributes, and access permissions, and provides Virtual to physical address translation.
Software configuration is performed by system registers.
In some designs, the external memory system might contain further implementation-specific caches of external memories.
The memory type is not directly encoded in the translation table entry. Instead, each block entry specifies a 3-bit index into a table of memory types. This table is stored in the Memory Attribute Indirection Register MAIR_ELn. This table has eight entries and each of those entries has 8 bits, as shown in the following figure.
Although the translation table block entry itself does not directly contain the memory type encoding, the TLB entry inside the processor usually stores this information for a specific entry. Therefore, changes to MAIR_ELn might not be observed until after both an ISB instruction barrier and a TLB invalidate operation.
The MMU translation tables also define the cache policy for each block within the memory system. Memory regions that are defined as Normal might be marked as Cacheable or Non-cacheable. Bits [4:2] from the translation table entry refer to one of the eight memory attribute encodings in the MAIR_ELn. The memory attribute encodings then specify the cache policies to use when accessing that memory. These are hints to the processor and it is IMPLEMENTATION DEFINED whether all cache policies are supported in a particular implementation and which cache data is regarded as coherent. A memory region can be defined in terms of its shareability property.
The following figure shows how memory attributes are specified in a stage 1 block entry. The block entry in the translation table defines the attributes for each memory region. Stage 2 entries have a different layout.
- Unprivileged eXecute Never (UXN) and Privileged eXecute Never (PXN) are execution permissions.
- AF is the access flag.
- SH is the shareable attribute.
- AP is the access permission.
- NS is the security bit, but only at EL3 and Secure EL1.
- Indx is the index into the MAIR_EL
For clarity, not all bits are shown in the figure.
The descriptor format supports hierarchical attributes, so that an attribute set at one level can be inherited by lower levels. It means that a table entry in an L0, L1, or L2 table can override one or more attributes that are specified in the table that it points to. This can be used for access permissions, security, and execution permissions. For example, an entry in the L1 table that has NSTable = 1 means that the NS bits in the L2 and L3 tables that it points to are ignored and all the entries are treated as having NS = 1. This feature only restricts subsequent levels of look-up for the same stage of translation.