In TrustZone in the processor and System architecture, we explored TrustZone support in hardware, both the Arm processor and wider memory system. This topic looks at the software architecture that is found in TrustZone systems.
Top-level software architecture
The following diagram shows a typical software stack for a TrustZone enabled system:
Note: For simplicity, the diagram does not include a hypervisor, although they might be present.
The Trusted kernel in Secure state hosts services, like key management or DRM. Software running in Non-secure state needs to have controlled accesses to those services.
A user-space application is unlikely to be directly aware of TrustZone. Instead it would use a high-level API that is provided by a user-space library. That library handles communication with the Trusted service. This is similar to how, for example, a graphics API provides abstraction from the underlying GPU.
Communication between the service library and the Trusted service is typically handled using message queues or mailboxes in memory. The term World Shared Memory (WSM) is sometimes used to describe memory that is used for this communication. These queues must be in memory that both sets of software can see, which means Non-secure memory. This is because Non-secure state can only see Non-secure memory.
The service library places a request, or requests, in the mailbox and then invokes a driver in kernel space. The driver is responsible for low-level interactions with the Trusted Execution Environment (TEE), which could include allocating the memory for the message queues and registering them with the TEE. Remember that the two worlds are operating in different virtual address spaces, therefore they cannot use virtual addresses for communication.
The driver would call Secure state, typically using an SMC. Control would pass through the EL3 Secure Monitor to the Trusted Kernel in the TEE. The kernel invokes the requested service, which can then read the request from the queue.
- Trusting the message
In the flow that we have just described, the requests sit in a queue that is located in Non-secure memory. What if:
- The application that made the initial request is malicious?
- Other malicious software substituted the messages in the queue?
The TEE must assume that any request or data that is provided from Non-secure state might be malicious or in some other way invalid. This means that authenticating the request, or requestor, needs to be done in Secure state.
What this looks like will depend on the Trusted service being provided and its security requirements. There is no one single answer.
In a TrustZone system there are two software stacks, one for Non-secure state and another for Secure state. A processor core can only be in one state at a time. Who decides when each world is allowed to run?
Explicit calls to the EL3 firmware, like power management requests using Power State Coordination Interface (PSCI), are typically blocking. This means that control will only be returned to Non-secure state when the requested operation is complete. However, these calls tend to be short and infrequent.
The TEE typically runs under the control of the Non-secure state OS scheduler. A possible design is to have a daemon running under the OS as a place holder for the TEE. When the daemon is scheduled by the OS, the daemon hands control to the TEE through an SMC. The TEE then runs, processing outstanding requests, until the next scheduler tick or interrupt. Then control returns to the Non-secure state OS.
This might seem odd, because this approach gives the untrusted software control over when Trusted software can execute, which could enable denial of service attacks. However, because the TEE provides services to Non-secure state, preventing it from running only prevents those services from being available. For example, an attacker could prevent a user from playing a DRM-protected video. Such an attack does not cause any information to be leaked. This type of design can ensure confidentiality but not availability.
We could design the software stack to also give availability. The GIC allows Secure interrupts to be made higher priority than Non-secure interrupts, preventing Non-secure state from being able to block the taking of a Secure interrupt.
There are many Trusted kernels, both commercial and open source. One example is OP-TEE, originally developed by ST-Ericsson, but now an open-source project hosted by Linaro. OP-TEE provides a fully featured Trusted Execution Environment, and you can find a detailed description on the OP-TEE project website.
The structure of OP-TEE is shown in the following diagram:
The OP-TEE kernel runs in S.EL1, hosting Trusted applications in S.EL0. The Trusted applications communicate with the OP-TEE kernel through the TEE Internal API. TheTEE Internal API is a standard API developed by the GlobalPlatform group. GlobalPlatform work to develop standard APIs, which are supported by many different TEEs, not just OP-TEE.Note: In the preceding diagram, the Trusted applications are not shown as OP-TEE components. This is because they are not part of the core OP-TEE OS. The OP-TEE project does provide some example Trusted Applications for people to experiment with.
In Non-secure state, there is a low-level OP-TEE driver in kernel space. This is responsible for handling the low-level communication with the OP-TEE kernel.
In the Non-secure user space (EL0), there is a user-space library implementing another GlobalPlatform API. The TEE Client API is what applications use to access a Trusted application or service. In most cases, we would not expect an application to use the TEE Client API directly. Rather there would be another service-specific library providing a higher-level interface.
OP-TEE also includes a component that is called the tee-supplicant. The tee-supplicant handles services that are supported by OP-TEE and require some level of rich OS interaction. An example is secure storage.
Interacting with Non-secure virtualization
In the examples that we have covered so far, we have ignored the possible presence of a hypervisor in Non-secure state. When a hypervisor is present, much of the communication between a VM and Secure state will be through the hypervisor.
For example, in a virtualized environment SMCs are used to access both firmware functions and Trusted services. The firmware functions include things like power management, which a hypervisor would typically not wish to allow a VM to have direct access to.
The hypervisor can trap SMCs from EL1, which allows the hypervisor to check whether the request is for a firmware service or a Trusted service. If the request is for a firmware service, the hypervisor can emulate the interfaces rather than passing on call. The hypervisor can forward Trusted service requests to EL3. You can see this in the following diagram:
Boot and the chain of trust
Boot is a critical part of any TrustZone system. A software component can only be trusted if we trust all the software components that ran before it in the boot flow. This is often referred to as the chain of trust. A simplified chain of trust is shown in the following diagram:
In our example, the first code that runs is the boot ROM. We must implicitly trust the boot ROM, because there are no earlier stages of boot to verify its contents. Being in ROM protects the initial boot code from being rewritten. Keeping the initial boot code on-chip prevents it from being replaced, so we can implicitly trust it. The boot ROM code is typically small and simple. Its main function is to load and verify the second stage boot code from flash.
The second stage boot code performs system initialization of the platform, like setting up the memory controller for off-chip DRAM. This code is also responsible for loading and verifying the images that will run in Secure and Non-secure state. Examples include loading a TEE in Secure state and higher-level firmware like UEFI in Non-secure state.
Earlier we introduced the System Control Processor (SCP). An SCP is a microcontroller that performs low-level system control in many modern SoCs. Where an SCP, or similar, is present it also forms part of the chain of trust. The following diagram shows this:
In a Trusted boot system, each component verifies the next component before it loads, forming a chain of trust. Let’s look now at what happens when verification fails.
There is no one answer for this situation. It depends on the security needs of the system and which stage of the boot processor the failure occurs at. Consider the example of an SoC in a mobile device. If the verification failed at:
- Second stage boot image
The second stage boot image is required for initialization of the SoC and processor. If verification fails at this stage, we might not be sure that the device can boot safely and function correctly. Therefore, if verification fails at this stage it is usually fatal and the device cannot boot.
The TEE provides services, like key management. The device can still function, perhaps at a limited level, without the TEE being present. Therefore, we could choose to not load the TEE, but still allow the Non-secure state software to load.
- Non-secure state firmware or Rich OS image
The Non-secure state software is already at a lower level of trust. We might choose to allow it to boot, but block accesses to advanced features provided via the TEE. For example, a TrustZone-enabled DRM might not be available with an untrusted OS image.
These are just examples. Each system needs to make its own decisions based on its security requirements.
Trusted Board Boot Requirements
Earlier we introduced the Trusted Base System Architecture (TBSA), which is guidance for system designers. The Trusted Board Boot Requirements (TBBR) are a similar set of guidelines for software developers. TBBR gives guidance on how to construct a Trusted boot flow in a TrustZone-enabled system.
Trusted Firmware is an open-source reference implementation of Secure world software for Armv8-A devices. Trusted Firmware provides SoC developers and OEMs with a reference Trusted code base that complies with the relevant Arm specifications, including TBBR and SMCC.
The following diagram shows the structure of the Trusted Firmware:
The SMC dispatcher handles incoming
SMCs. The SMC dispatcher identifies which
SMCs should be dealt with at EL3, by Trusted Firmware, and which
SMCs should be forwarded the Trusted Execution Environment.
The Trusted Firmware provides code for dealing with Arm system IP, like interconnects. Silicon providers need to provide code for handling custom or third-party IP. This includes SoC-specific power management.