Single-level table at EL3

The first example covers the simplest scenario: A single level of translation in the EL3 translation regime. We are going to flat map the virtual addresses. This means that the input virtual address and output physical address are the same for all translations. The MMU is only being used to control attributes and permissions.

In the examples package, the files are in: <example dir>\el3_stage1_l1only\

Specify the location of the translation table

The code for the example is in startup.S. Looking at this file, the MMU code starts at line 159. Here you see the first interesting piece of code:

// Set the Base address
// ---------------------
LDR   x0, =tt_l1_base  // Get address of level 1 for TTBR0_EL3
MSR   TTBR0_EL3, x0    // Set TTBR0_EL3 (NOTE: There is no TTBR1 at EL3)

This code loads the address of the memory that is allocated for the translation table, and then writes that address into the Translation Table Base Register (TTBR0_ELx). This register tells the processor where the first level table is located when a table walk is required.

The symbol name indicates that the register points to a level 1 table. We see later in this section Configure the translation regime how the starting level of translation is configured.

The memory for the table is allocated at the end of the file, as you see here:

.section TT,"ax"
.align 12
.global tt_l1_base tt_l1_base: .fill 4096 , 1 , 0

The code defines a sensible label (tt_l1_base) to let us refer to the allocated memory. The fill directive then allocates a 4KB block that is pre-filled with zeros. This is useful because a value of 0 in a translation table entry means Fault. The value 0 in a descriptor.

Translation tables must be size aligned. In this example, we have a full level 1 table. With a 4KB granule, a full level 1 table includes 512 entries. Each entry is 8 bytes. This means that the table is 4KB in size, and must start on a 4KB boundary. The align directive sets the alignment as a power of 2. In this case, the alignment is 2^12=4096.

Initialize the MAIR

Going back to the code, let’s look at the next step, which you see here:

// Set up memory attributes
 // -------------------------
 // This equates to:
 // 0 = b01000100 = Normal, Inner/Outer Non-Cacheable
 // 1 = b11111111 = Normal, Inner/Outer WB/WA/RA
 // 2 = b00000000 = Device-nGnRnE
 MOV   x0, #0x000000000000FF44
 MSR   MAIR_EL3, x0

We learned in the Memory model guide that the Type, either Normal or Device, is not directly encoded with the translation table entries for stage 1 tables. Instead, the table entries contain an index into the Memory Attribute Indirection Register (MAIR_ELx). Each 8-bit entry is set by software to specify a different memory Type. The example populates only the first three entries within the MAIR:

  • [0] = Normal, Inner and Outer Non-cacheable
  • [1] = Normal, Inner and Outer Cacheable, with write-back and read/write allocation
  • [2] = Device_nGnRnE

For this simple example, these three types are enough. We do not use the other index values.

Which Type is specified in which MAIR index is important later when we create the translation table entries.

Configure the translation regime

The next step is to configure the translation regime, as you see here:

 // Set up TCR_EL3
 // ---------------
 MOV   x0, #0x19            // T0SZ=0b011001 Limits VA space to 39 bits,
                            // translation starts @ l1
 ORR   x0, x0, #(0x1 << 8)  // IGRN0=0b01  Walks to TTBR0 are Inner WB/WA
 ORR   x0, x0, #(0x1 << 10) // OGRN0=0b01  Walks to TTBR0 are Outer WB/WA
 ORR   x0, x0, #(0x3 << 12) // SH0=0b11   Inner Shareable
                            // TBI0=0b0   Top byte not ignored
                            // TG0=0b00   4KB granule
                            // IPS=0     32-bit PA space
 MSR   TCR_EL3, x0

The Translation Control Register (TCR_ELx) configures many aspects of the translation regime, including:

TnSZ
Controls the size of the virtual address space that is being described
TGn
Sets the granule, which is the smallest describable block, for the translation regime
IGRNn/ORGNn/SH
Specifies the cacheability and shareability that the MMU should use for table walks
TBIn
To byte ignore. Setting this bit causes the top 8 bits of the virtual address to be ignored by the processor when performing virtual to physical translation. Allowing software to store something else in those bits instead. In this exercise, we do not use this feature, so we leave it disabled.

Note: For an example that shows when the TBI feature is used, see the description of Memory Tagging in the Providing protection for complex software guide

The selected granule (TG0) for all the examples in this guide is 4KB. As described in the Memory management guide, the granule determines the different page and block sizes that are used. With a 4KB granule, the options are:

  • L0 table: 512GB per entry
  • L1 tables: Each table covers 512GB, 1GB per entry
  • L2 tables: Each table covers 1GB, 2MB per entry
  • L3 tables: Each table covers 2MB, 4KB per entry

The size of the virtual address space is configured as 64 – TnSZ. In this example, 64 – 0x19 gives 39 bits of virtual address space. This equates to 512GB (2^39), which means that the entire virtual address space is covered by a single L1 table. Therefore, our starting level of translation is level 1.

The next part of the example is shown here:

 // Invalidate TLBs
 // ----------------
 TLBI  ALLE3
 DSB   SY
 ISB

The state of the Translation Lookaside Buffers (TLB) are not guaranteed at reset. Therefore, the example invalidates the TLB before enabling the MMU. The command (TLBI ALLE3) invalidates all cached translations for the EL3 translation regime, which is the translation regime that the example is configuring.

Generate the translation tables

The next step is to generate the tables in memory. This example creates a minimal set of entries, as you see in the following code:

LDR   x1, =tt_l1_base              // Address of L1 table

 // [0]: 0x0000,0000 - 0x3FFF,FFFF
 LDR   x0, =TT_S1_DEVICE_nGnRnE     // Entry template
                                    // AP=0, RW
                                    // Don't need to OR in address, as it is 0
 STR   x0, [x1]
 
 // [1]: 0x4000,0000 - 0x7FFF,FFFF
 LDR   x0, =TT_S1_DEVICE_nGnRnE     // Entry template
                                    // AP=0, RW
 ORR   x0, x0, #0x40000000          // 'OR' template with base physical address
 STR   x0, [x1, #8]

 // [2]: 0x8000,0000 - 0xBFFF,FFFF (DRAM on the VE and Base Platform)
 LDR   x0, =TT_S1_NORMAL_WBWA       // Entry template
 ORR   x0, x0, #TT_S1_INNER_SHARED  // ‘OR’ with inner-shareable attribute
                                    // AP=0, RW
 ORR   x0, x0, #0x80000000          // 'OR' template with base physical address

As described in the previous section Configure the translation regime, L1 is the first level of translation in this example. With a 4K granule, this means that each entry in the table covers 1GB of address space. The example only populates the first three entries, covering the first three 3GB of the virtual address space.

In the previous section Specify the location of the translation table, we showed how to allocate the memory for the translation table. A 4KB region that is prefilled with zeros was allocated with a fill directive. A value of zero corresponds to a Fault in the translation table. Therefore, all the entries that are not written are faulting entries.

Note: In a real system, software would typically fill the table with zeros at run-time, instead of relying on allocating them in the source. However, pre-allocating the zeros can speed up some simulations or emulations.

Understand how an entry is formed

The code uses symbols that are defined as templates at the start of the file. For example, TT_S1_NORMAL_WBWA is a template for a Normal, Write-back, Read/Write-allocate entry. The definition of this template is shown in this code:

.equ TT_S1_NORMAL_WBWA,   0x00000000000000405

The following diagram shows the format of a stage 1 level 1 table entry:

Decoding the TT_S1_NORMAL_WBWA template gives:

  • Indx= b01, take Type information from entry [1] in the MAIR
  • NS= b0, output physical addresses are Secure
  • AP= b00, address is readable and writeable
  • SH= b00, Non-shareable
  • AF= b1, Access Flag is pre-set. No Access Flag Fault is generated on access.
  • nG= Not used at EL3
  • Contig = b0, the entry is not part of a contiguous block
  • PXN= b0, block is executable. This attribute is called XN at EL3.
  • UXN= Not used at EL3

In the template, we see why knowing the configuration of the MAIR is important. The template relies on MAIR having entry [1] pre-set to Normal/Cacheable.

We want the region to be Inner-shareable, not Non-shareable as defined within the template. To fix this, the example combines the TT_S1_NORMAL_WBWA template with another template, TT_S1_INNER_SHAREABLE. This second template sets the correct value in the SH field.

Check your knowledge: Look at the other templates defined within the example. How would you modify the preceding example to map to a Non-secure physical address?

Answer: To map to a Non-secure Physical address requires setting the NS bit to 1. The example has a template for this, TT_S1_NS, which could be ORed like we did with for the Shareability attribute.

Overview of the configured virtual address space

With this set of translation table entries, the virtual address looks like what you see in the following diagram:

Enable the MMU

At this point, the MMU is configured and the translation tables are created in memory. The next step is to enable the MMU, as you see in the following code:

 // Enable MMU
 // -----------
 MOV   x0, #(1 << 0)      // M=1  Enable the stage 1 MMU
 ORR   x0, x0, #(1 << 2)  // C=1  Enable data and unified caches
 ORR   x0, x0, #(1 << 12) // I=1  Enable instruction fetches to allocate
                         //    into unified caches
                         // A=0  Strict alignment checking disabled
                         // SA=0 Stack alignment checking disabled
                         // WXN=0 Write permission does not imply XN
                         // EE=0 EL3 data accesses are little endian
 MSR   SCTLR_EL3, x0
 ISB

The example sets the M, C, and I bits in the System Control Register (SCTLR_ELx). Setting these bits enables the MMU and caches. The ISB after the write to the SCTLR ensures that the effect of enabling the MMU is visible to the next instruction.

The table walk

The examples in this exercise are developed to run on the Base Platform Fixed Virtual Platform (FVP). The Base Platform FVP is a model that is provided by Arm. FVP models trace the simulation, and provide detailed information on the execution of the simulated processor. The resulting trace is in the TARMAC format. Here is more information on TARMAC.

Tracing the entire example produces hundreds of lines of trace data. Instead, let's begin the trace at the point where the MMU is enabled, as you see here:

75 clk IT (75) 8000012c d51e1000 O EL3h_s : MSR   SCTLR_EL3,x0
75 clk R SCTLR_EL3 00000000:00001005
75 clk CACHE FVP_Base_AEMv8A_AEMv8A.cluster0.cpu0.l1dcache LINE 0100 ALLOC 0x000080002000
75 clk CACHE FVP_Base_AEMv8A_AEMv8A.cluster0.l2_cache LINE 0800 ALLOC 0x000080002000
75 clk TTW ITLB LPAE 1:1 000080002010 0000000080000705 : BLOCK ATTRIDX=1 NS=0 AP=0 SH=3 AF=1 nG=0 16E=0 PXN=0 XN=0 ADDR=0x0000000080000000
75 clk TLB FILL FVP_Base_AEMv8A_AEMv8A.cluster0.cpu0.UTLB 1G 0x80000000 EL3_s, nG asid=0:0x0080000000 Normal InnerShareable Inner=WriteBackWriteAllocate Outer=WriteBackWriteAllocate xn=0 pxn=0 ContiguousHint=0
75 clk CACHE FVP_Base_AEMv8A_AEMv8A.cluster0.cpu0.l1icache LINE 0008 ALLOC 0x000080000100
75 clk CACHE FVP_Base_AEMv8A_AEMv8A.cluster0.l2_cache LINE 0040 ALLOC 0x000080000100
76 clk IT (76) 80000130 d5033fdf O EL3h_s : ISB

The trace is dense, so let’s look at it one line at a time. This code shows the first section:

75 clk IT (75) 8000012c d51e1000 O EL3h_s : MSR   SCTLR_EL3,x0
75 clk R SCTLR_EL3 00000000:00001005

This code shows that the execution of the MSR, which enables the MMU. 0x8000_012C, is the address of the instruction and 0xD51E_1000 is the opcode. The second line shows the value the instruction wrote to the register.

Note: By default, the trace shows the value that is written to the register, not the new value of the register. In many cases, but not all cases, the new value is the written value. For example, if the register includes read-only fields, the new value is not the written value.

Because this instruction enabled the MMU, the processor needs to implement a table walk for the page containing the next instruction. First, the trace shows this code:

75 clk CACHE FVP_Base_AEMv8A_AEMv8A.cluster0.cpu0.l1dcache LINE 0100 ALLOC 0x000080002000
75 clk CACHE FVP_Base_AEMv8A_AEMv8A.cluster0.l2_cache LINE 0800 ALLOC 0x000080002000

When we configured TCR_EL3, we configured the MMU to use cacheable accesses for the table walk. The preceding two lines show the cache line that contains the required table entry being fetched into the cache. Once the line is returned from the memory, the descriptor can be interpreted by the MMU. This interpretation is shown in the next line of code, as you can see here:

75 clk TTW ITLB LPAE 1:1 000080002010 0000000080000705 : BLOCK ATTRIDX=1 NS=0 AP=0 SH=3 AF=1 nG=0 16E=0 PXN=0 XN=0 ADDR=0x0000000080000000

The preceding trace entry shows the MMU processing the table entry. This entry shows us the following things:

  • TTW= Table walk
  • ITLB= Table walk for the instruction interface. The I is for instruction.
  • 1:1= Stage 1, level 1 table entry
  • 0x80002010= Address the entry was fetched from
  • 0x0000000080000705= Entry returned from memory system
  • BLOCK= The entry is a Block entry
  • ATTRIDX=1= Uses MAIR entry 1.
  • NS= Output physical address is Secure
  • AP= Access permission bits
  • SH= Shareability bits

Finally, the trace shows the TLB record that is being generated, as you see in this code:

75 clk TLB FILL FVP_Base_AEMv8A_AEMv8A.cluster0.cpu0.UTLB 1G 0x80000000 EL3_s, nG asid=0:0x0080000000 Normal InnerShareable Inner=WriteBackWriteAllocate Outer=WriteBackWriteAllocate xn=0 pxn=0 ContiguousHint=0

The trace shows that the TLB entry is created as follows:

  • 1GB block
  • PA:0x8000_0000
  • VA:0x8000_0000, with ASID 0, although ASIDs are not used at EL3
  • Translation regime: EL3
  • Normal, Inner Shareable, Write-Back, Write-Allocate
  • Execute-able

Check your knowledge: Look at the preceding code and find where all the settings that are shown in the trace come from.

Previous Next