Single-level table at EL3
The first example covers the simplest scenario: A single level of translation in the EL3 translation regime. We are going to flat map the virtual addresses. This means that the input virtual address and output physical address are the same for all translations. The MMU is only being used to control attributes and permissions.
In the examples package, the files are in:
Specify the location of the translation table
The code for the example is in
startup.S. Looking at this file, the MMU code starts at line 159. Here you see the first interesting piece of code:
// Set the Base address // --------------------- LDR x0, =tt_l1_base // Get address of level 1 for TTBR0_EL3 MSR TTBR0_EL3, x0 // Set TTBR0_EL3 (NOTE: There is no TTBR1 at EL3)
This code loads the address of the memory that is allocated for the translation table, and then writes that address into the Translation Table Base Register (
TTBR0_ELx). This register tells the processor where the first level table is located when a table walk is required.
The symbol name indicates that the register points to a level 1 table. We see later in this section Configure the translation regime how the starting level of translation is configured.
The memory for the table is allocated at the end of the file, as you see here:
.section TT,"ax" .align 12
.global tt_l1_base tt_l1_base: .fill 4096 , 1 , 0
The code defines a sensible label (
tt_l1_base) to let us refer to the allocated memory. The
fill directive then allocates a 4KB block that is pre-filled with zeros. This is useful because a value of 0 in a translation table entry means Fault. The value 0 in a descriptor.
Translation tables must be size aligned. In this example, we have a full level 1 table. With a 4KB granule, a full level 1 table includes 512 entries. Each entry is 8 bytes. This means that the table is 4KB in size, and must start on a 4KB boundary. The
align directive sets the alignment as a power of 2. In this case, the alignment is
Initialize the MAIR
Going back to the code, let’s look at the next step, which you see here:
// Set up memory attributes // ------------------------- // This equates to: // 0 = b01000100 = Normal, Inner/Outer Non-Cacheable // 1 = b11111111 = Normal, Inner/Outer WB/WA/RA // 2 = b00000000 = Device-nGnRnE MOV x0, #0x000000000000FF44 MSR MAIR_EL3, x0
We learned in the Memory model guide that the Type, either Normal or Device, is not directly encoded with the translation table entries for stage 1 tables. Instead, the table entries contain an index into the Memory Attribute Indirection Register (
MAIR_ELx). Each 8-bit entry is set by software to specify a different memory Type. The example populates only the first three entries within the
-  = Normal, Inner and Outer Non-cacheable
-  = Normal, Inner and Outer Cacheable, with write-back and read/write allocation
-  = Device_nGnRnE
For this simple example, these three types are enough. We do not use the other index values.
Which Type is specified in which MAIR index is important later when we create the translation table entries.
Configure the translation regime
The next step is to configure the translation regime, as you see here:
// Set up TCR_EL3 // --------------- MOV x0, #0x19 // T0SZ=0b011001 Limits VA space to 39 bits, // translation starts @ l1 ORR x0, x0, #(0x1 << 8) // IGRN0=0b01 Walks to TTBR0 are Inner WB/WA ORR x0, x0, #(0x1 << 10) // OGRN0=0b01 Walks to TTBR0 are Outer WB/WA ORR x0, x0, #(0x3 << 12) // SH0=0b11 Inner Shareable // TBI0=0b0 Top byte not ignored // TG0=0b00 4KB granule // IPS=0 32-bit PA space MSR TCR_EL3, x0
The Translation Control Register (
TCR_ELx) configures many aspects of the translation regime, including:
- Controls the size of the virtual address space that is being described
- Sets the granule, which is the smallest describable block, for the translation regime
- Specifies the cacheability and shareability that the MMU should use for table walks
- To byte ignore. Setting this bit causes the top 8 bits of the virtual address to be ignored by the processor when performing virtual to physical translation. Allowing software to store something else in those bits instead. In this exercise, we do not use this feature, so we leave it disabled.
Note: For an example that shows when the TBI feature is used, see the description of Memory Tagging in the Providing protection for complex software guide.
The selected granule (
TG0) for all the examples in this guide is 4KB. As described in the Memory management guide, the granule determines the different page and block sizes that are used. With a 4KB granule, the options are:
- L0 table: 512GB per entry
- L1 tables: Each table covers 512GB, 1GB per entry
- L2 tables: Each table covers 1GB, 2MB per entry
- L3 tables: Each table covers 2MB, 4KB per entry
The size of the virtual address space is configured as
64 – TnSZ. In this example,
64 – 0x19 gives 39 bits of virtual address space. This equates to 512GB (2^39), which means that the entire virtual address space is covered by a single L1 table. Therefore, our starting level of translation is level 1.
The next part of the example is shown here:
// Invalidate TLBs // ---------------- TLBI ALLE3 DSB SY ISB
The state of the Translation Lookaside Buffers (TLB) are not guaranteed at reset. Therefore, the example invalidates the TLB before enabling the MMU. The command (
TLBI ALLE3) invalidates all cached translations for the EL3 translation regime, which is the translation regime that the example is configuring.
Generate the translation tables
The next step is to generate the tables in memory. This example creates a minimal set of entries, as you see in the following code:
LDR x1, =tt_l1_base // Address of L1 table // : 0x0000,0000 - 0x3FFF,FFFF LDR x0, =TT_S1_DEVICE_nGnRnE // Entry template // AP=0, RW // Don't need to OR in address, as it is 0 STR x0, [x1] // : 0x4000,0000 - 0x7FFF,FFFF LDR x0, =TT_S1_DEVICE_nGnRnE // Entry template // AP=0, RW ORR x0, x0, #0x40000000 // 'OR' template with base physical address STR x0, [x1, #8] // : 0x8000,0000 - 0xBFFF,FFFF (DRAM on the VE and Base Platform) LDR x0, =TT_S1_NORMAL_WBWA // Entry template ORR x0, x0, #TT_S1_INNER_SHARED // ‘OR’ with inner-shareable attribute // AP=0, RW ORR x0, x0, #0x80000000 // 'OR' template with base physical address
As described in the previous section Configure the translation regime, L1 is the first level of translation in this example. With a 4K granule, this means that each entry in the table covers 1GB of address space. The example only populates the first three entries, covering the first three 3GB of the virtual address space.
In the previous section Specify the location of the translation table, we showed how to allocate the memory for the translation table. A 4KB region that is prefilled with zeros was allocated with a
fill directive. A value of zero corresponds to a Fault in the translation table. Therefore, all the entries that are not written are faulting entries.
Note: In a real system, software would typically fill the table with zeros at run-time, instead of relying on allocating them in the source. However, pre-allocating the zeros can speed up some simulations or emulations.
Understand how an entry is formed
The code uses symbols that are defined as templates at the start of the file. For example,
TT_S1_NORMAL_WBWA is a template for a Normal, Write-back, Read/Write-allocate entry. The definition of this template is shown in this code:
.equ TT_S1_NORMAL_WBWA, 0x00000000000000405
The following diagram shows the format of a stage 1 level 1 table entry:
TT_S1_NORMAL_WBWA template gives:
Indx= b01, take Type information from entry  in the
NS= b0, output physical addresses are Secure
AP= b00, address is readable and writeable
SH= b00, Non-shareable
AF= b1, Access Flag is pre-set. No Access Flag Fault is generated on access.
nG=Not used at EL3
Contig = b0, the entry is not part of a contiguous block
PXN= b0, block is executable. This attribute is called
UXN=Not used at EL3
In the template, we see why knowing the configuration of the
MAIR is important. The template relies on
MAIR having entry  pre-set to Normal/Cacheable.
We want the region to be Inner-shareable, not Non-shareable as defined within the template. To fix this, the example combines the
TT_S1_NORMAL_WBWA template with another template,
TT_S1_INNER_SHAREABLE. This second template sets the correct value in the
Check your knowledge: Look at the other templates defined within the example. How would you modify the preceding example to map to a Non-secure physical address?
Answer: To map to a Non-secure Physical address requires setting the
NS bit to 1. The example has a template for this,
TT_S1_NS, which could be ORed like we did with for the Shareability attribute.
Overview of the configured virtual address space
With this set of translation table entries, the virtual address looks like what you see in the following diagram:
Enable the MMU
At this point, the MMU is configured and the translation tables are created in memory. The next step is to enable the MMU, as you see in the following code:
// Enable MMU // ----------- MOV x0, #(1 << 0) // M=1 Enable the stage 1 MMU ORR x0, x0, #(1 << 2) // C=1 Enable data and unified caches ORR x0, x0, #(1 << 12) // I=1 Enable instruction fetches to allocate // into unified caches // A=0 Strict alignment checking disabled // SA=0 Stack alignment checking disabled // WXN=0 Write permission does not imply XN // EE=0 EL3 data accesses are little endian MSR SCTLR_EL3, x0 ISB
The example sets the
M, C, and
I bits in the System Control Register (
SCTLR_ELx). Setting these bits enables the MMU and caches. The
ISB after the write to the
SCTLR ensures that the effect of enabling the MMU is visible to the next instruction.
The table walk
The examples in this exercise are developed to run on the Base Platform Fixed Virtual Platform (FVP). The Base Platform FVP is a model that is provided by Arm. FVP models trace the simulation, and provide detailed information on the execution of the simulated processor. The resulting trace is in the TARMAC format. Here is more information on TARMAC.
Tracing the entire example produces hundreds of lines of trace data. Instead, let's begin the trace at the point where the MMU is enabled, as you see here:
75 clk IT (75) 8000012c d51e1000 O EL3h_s : MSR SCTLR_EL3,x0 75 clk R SCTLR_EL3 00000000:00001005 75 clk CACHE FVP_Base_AEMv8A_AEMv8A.cluster0.cpu0.l1dcache LINE 0100 ALLOC 0x000080002000 75 clk CACHE FVP_Base_AEMv8A_AEMv8A.cluster0.l2_cache LINE 0800 ALLOC 0x000080002000 75 clk TTW ITLB LPAE 1:1 000080002010 0000000080000705 : BLOCK ATTRIDX=1 NS=0 AP=0 SH=3 AF=1 nG=0 16E=0 PXN=0 XN=0 ADDR=0x0000000080000000 75 clk TLB FILL FVP_Base_AEMv8A_AEMv8A.cluster0.cpu0.UTLB 1G 0x80000000 EL3_s, nG asid=0:0x0080000000 Normal InnerShareable Inner=WriteBackWriteAllocate Outer=WriteBackWriteAllocate xn=0 pxn=0 ContiguousHint=0 75 clk CACHE FVP_Base_AEMv8A_AEMv8A.cluster0.cpu0.l1icache LINE 0008 ALLOC 0x000080000100 75 clk CACHE FVP_Base_AEMv8A_AEMv8A.cluster0.l2_cache LINE 0040 ALLOC 0x000080000100 76 clk IT (76) 80000130 d5033fdf O EL3h_s : ISB
The trace is dense, so let’s look at it one line at a time. This code shows the first section:
75 clk IT (75) 8000012c d51e1000 O EL3h_s : MSR SCTLR_EL3,x0 75 clk R SCTLR_EL3 00000000:00001005
This code shows that the execution of the
MSR, which enables the MMU. 0x8000_012C, is the address of the instruction and 0xD51E_1000 is the opcode. The second line shows the value the instruction wrote to the register.
Note: By default, the trace shows the value that is written to the register, not the new value of the register. In many cases, but not all cases, the new value is the written value. For example, if the register includes read-only fields, the new value is not the written value.
Because this instruction enabled the MMU, the processor needs to implement a table walk for the page containing the next instruction. First, the trace shows this code:
75 clk CACHE FVP_Base_AEMv8A_AEMv8A.cluster0.cpu0.l1dcache LINE 0100 ALLOC 0x000080002000 75 clk CACHE FVP_Base_AEMv8A_AEMv8A.cluster0.l2_cache LINE 0800 ALLOC 0x000080002000
When we configured
TCR_EL3, we configured the MMU to use cacheable accesses for the table walk. The preceding two lines show the cache line that contains the required table entry being fetched into the cache. Once the line is returned from the memory, the descriptor can be interpreted by the MMU. This interpretation is shown in the next line of code, as you can see here:
75 clk TTW ITLB LPAE 1:1 000080002010 0000000080000705 : BLOCK ATTRIDX=1 NS=0 AP=0 SH=3 AF=1 nG=0 16E=0 PXN=0 XN=0 ADDR=0x0000000080000000
The preceding trace entry shows the MMU processing the table entry. This entry shows us the following things:
TTW= Table walk
ITLB= Table walk for the instruction interface. The I is for instruction.
1:1= Stage 1, level 1 table entry
0x80002010= Address the entry was fetched from
0x0000000080000705= Entry returned from memory system
BLOCK= The entry is a Block entry
ATTRIDX=1= Uses MAIR entry 1.
NS= Output physical address is Secure
AP= Access permission bits
SH= Shareability bits
Finally, the trace shows the TLB record that is being generated, as you see in this code:
75 clk TLB FILL FVP_Base_AEMv8A_AEMv8A.cluster0.cpu0.UTLB 1G 0x80000000 EL3_s, nG asid=0:0x0080000000 Normal InnerShareable Inner=WriteBackWriteAllocate Outer=WriteBackWriteAllocate xn=0 pxn=0 ContiguousHint=0
The trace shows that the TLB entry is created as follows:
- 1GB block
VA:0x8000_0000, with ASID 0, although ASIDs are not used at EL3
- Translation regime: EL3
- Normal, Inner Shareable, Write-Back, Write-Allocate
Check your knowledge: Look at the preceding code and find where all the settings that are shown in the trace come from.