You copied the Doc URL to your clipboard.

Arm Cortex‑A76 Core Technical Reference Manual : Data prefetching

Data prefetching

This section describes the data prefetching behavior for the Cortex®‑A76 core.

Preload instructions

The Cortex‑A76 core supports the AArch64 Prefetch Memory (PRFM) instructions and the AArch32 Prefetch Data (PLD) and Preload Data With Intent To Write (PLDW) instructions. These instructions signal to the memory system that memory accesses from a specified address are likely to occur soon. The memory system acts by taking actions that aim to reduce the latency of the memory access when they occur. PRFM instructions perform a lookup in the cache, and if they miss and are to a cacheable address, a linefill starts. However, the PRFM instruction retires when its linefill is started, rather than waiting for the linefill to complete. This enables other instructions to execute while the linefill continues in the background.

The Preload Instruction (PLI) memory system hint performs preloading in the L2 cache for cacheable accesses if they miss in both the L1 instruction cache and L2 cache. Instruction preloading is performed in the background.

For more information about prefetch memory and preloading caches, see the Arm® Architecture Reference Manual Armv8, for Armv8-A architecture profile.

Data prefetching and monitoring

The load-store unit includes a hardware prefetcher that is responsible for generating prefetches targeting both the L1 and the L2 cache. The load side prefetcher uses the virtual address to prefetch to both the L1 and L2 Cache. The store side prefetcher uses the physical address, and only prefetches to the L2 Cache.

The CPUECTLR register allows you to have some control over the prefetcher. See CPUECTLR_EL1, CPU Extended Control Register, EL1 for more information on the control of the prefetcher.

Use the prefetch memory system instructions for data prefetching where short sequences or irregular pattern fetches are required.

Data cache zero

The Armv8-A architecture introduces a Data Cache Zero by Virtual Address (DC ZVA) instruction.

In the Cortex‑A76 core, this enables a block of 64 bytes in memory, aligned to 64 bytes in size, to be set to zero.

For more information, see the Arm® Architecture Reference Manual Armv8, for Armv8-A architecture profile.

Was this page helpful? Yes No