You copied the Doc URL to your clipboard.

16.4.2. Memory system effects on instruction timings

Because the processor is a statically scheduled design, any stall from the memory system can result in the minimum of a 8-cycle delay. This 8-cycle delay minimum is balanced with the minimum number of possible cycles to receive data from the L2 cache in the case of an L1 load miss. Table 16.13 gives the most common cases that can result in an instruction replay because of a memory system stall.

Memory system effects on instruction timings
Replay eventDelayDescription
Load data miss8 cycles
  1. A load instruction misses in the L1 data cache.

  2. A request is then made to the L2 data cache.

  3. If a miss also occurs in the L2 data cache, then a second replay occurs. The number of stall cycles depends on the external system memory timing. The minimum time required to receive the critical word for an L2 cache miss is approximately 25 cycles, but can be much longer because of L3 memory latencies.

Data TLB miss24 cycles
  1. A table walk because of a miss in the L1 TLB causes a 24-cycle delay, assuming the translation table entries are found in the L2 cache.

  2. If the translation table entries are not present in the L2 cache, the number of stall cycles depends on the external system memory timing.

Store buffer full

8 cycles plus latency to drain fill buffer

  1. A store instruction miss does not result in any stalls unless the store buffer is full.

  2. In the case of a full store buffer, the delay is at least eight cycles. The delay can be more if it takes longer to drain some entries from the store buffer.

Unaligned

load or store

request

8 cycles
  1. If a load instruction address is unaligned and the full access is not contained within a 128-bit boundary, there is a 8-cycle penalty.

  2. If a store instruction address is unaligned and the full access is not contained within a 64-bit boundary, there is a 8-cycle penalty.

Was this page helpful? Yes No