Detecting memory safety violations

Some classes of vulnerability that are related to memory usage can be difficult to detect and test for. Two examples of this are:

  • Use after free - Applications continue to use allocated memory after releasing it, or after it is out of scope. This is a violation of temporal memory safety.
  • Buffer overrun, or overflow - Going beyond the bounds of an allocated structure or buffer, usually because of insufficient bounds checking. This is a violation of spatial memory safety.

Armv8.5-A introduces the Memory Tagging Extension (MTE), also called memory coloring. Memory tagging makes detecting memory safety violations easier and more efficient.

Note: One of the first Internet-spread computer worms was the Internet Worm in 1988, which exploited a buffer overrun. More than thirty years later, we are still seeing attacks that exploit this type of programming bug.

Memory tagging

Regions of address space are allocated a tag, or lock. The upper bits of a virtual address are also used to store a tag, or key. On a memory access, the processor compares the key in the issued address with the lock that is assigned to that physical location. Here is an example:

In the preceding diagram, two regions have been allocated, using tags 9 and 2.

For the first two pointers, the tag matches that of the accessed location. You can think of this as the key fitting the lock. Accesses using these pointers would succeed as normal.

However, for the final pointer the tag does not match that of the accessed location. This will be captured as a tag check failure. We will look at what happens in the case later.

Let’s apply this mechanism to the problems that we identified earlier, starting with buffer overruns, as you can see in this diagram:

On the call to malloc() the C library will allocate the memory and assign a tag for the buffer. The returned pointer will include the allocated tag. If software using the pointer goes beyond the limits of the buffer, the tag comparison check will fail. This failure will allow us to detect the overrun.

Similarly, for use-after-free, on the call to malloc() the buffer gets allocated in memory and assigned a tag value. The pointer that is returned by malloc() includes this tag. Later the buffer is released. The C library might change the tag when the memory is released or might wait until the memory is reused for some other purpose. If software continues to use the old pointer, it will have the old tag value and the tag check will catch it.

Note: The total number of possible tags is small. Therefore, the same tag value might be used for several different regions over time, or at the same time. However, with careful tag allocation, sequential overruns or underruns can be detected. Wild accesses are statistically likely to be caught.

Tags

To work with tags, the architecture gains several new instructions, including:

  • IRG - Generates a random tag value and inserts it to a pointer
  • STG - Sets the tag value for a block of memory
  • STZG - Sets the tag value for a block of memory, and zeros corresponding memory location
    • If the allocator is going to zero the allocated memory, STZG offers better performance than separate zeroing and tagging.
  • LDG - Reads the tag value for a block of memory

Tags are four bits and are stored in two places:

  • Key - Stored in bits [59:56] of a pointer
    • This requires pointer tagging to be enabled. We will discuss this later in the guide.
  • Lock - A new address space, the tag address space, is added. The tag address space records the tag to a memory region.

On allocating a block of memory, software allocates a tag either randomly, using IRG, or using a custom algorithm. Each tag covers 16 bytes. This means that software needs to execute STZG or STG multiple times to cover all the 16-byte blocks within the allocated memory.

Tagged and untagged addresses

Not all memory accesses require tag checking. We describe an access as Checked or Unchecked, depending on whether tag checking is carried out.

The following accesses are always Unchecked:

  • Instruction fetches
  • Translation table walks, including hardware updates of the Access Flag or Dirty state
  • Data cache maintenance operations
  • Accesses to the Allocation tags

For data accesses, a new memory attribute is added to indicate that accesses to this region should be Checked:

  • MemAttr[] == 0xF0: Inner+Outer Write-Back Cacheable, Read or Write-Allocate, Tagged

Data accesses to a region that is marked as Tagged are classed as Checked, unless one of the following applies:

  • TCR_ELx.TBI==0
  • The Logical tag (bits [59:56] of the virtual address) are b0000 or b1111.
  • The load or store uses the SP as a base register with an immediate offset, or no offset
  • It is a PC relative load.
  • PSTATE.TCO==1

Data accesses to any region without the Tagged attribute are Unchecked.

Note: Loads or stores using the stack pointer with an immediate offset can be statically checked at build time. This means that there is less benefit to checking with MTE. The same principle applies to PC-relative loads.

What happens when a comparison fails?

Let’s discuss what happens when the tag comparison fails. The architecture makes the behavior of tag comparison failure configurable, controlled by SCTLR_ELx.TCF, or SCTLR_ELx.TCF0 for EL0:

  • TCF==00 - Tag comparison failures are ignored.
  • TCF==01 - Tag comparison failures are reported as a synchronous Data Abort. The address that caused the failure is reported in FAR_ELx.
  • TCF==10 - Tag comparison failures are reported asynchronously by updating bits in TFSR_ELx, or TFSR0_EL1 for EL0. Optionally, checks can be synchronized on exception entry, to allow check failures to be attributed to a specific process.

The architecture provides both synchronous and asynchronous mechanisms to report tag comparison failures. Synchronous checking makes debugging simpler, because it allows you to identify the precise instruction and address that caused the failure. However, synchronous checking typically has a significant performance impact. This performance impact might be acceptable in a development environment but is too high for deployment.

Asynchronous checking is less costly. This means that asynchronous checking is potentially acceptable even on production systems. Although asynchronous checking provides less precise information on where the tag comparison failure occurred, it can provide some mitigation and be used for profiling. Profiling allows problem areas to be identified, narrowing down the search area for bugs.

Combining memory tagging and pointer authentication

Memory tagging and pointer authentication both use the upper bits of an address to store additional information about the pointer: a tag for memory tagging, and a PAC for pointer authentication.

Both technologies can be enabled at the same time. The size of the PAC is variable, depending on the size of the virtual address space. When memory tagging is enabled at the same time, there are fewer bits available for the PAC.

Previous Next