You copied the Doc URL to your clipboard.
16.4.2. Memory system effects on instruction timings
Because the processor is a statically scheduled design, any
stall from the memory system can result in the minimum of a 8-cycle
delay. This 8-cycle delay minimum is balanced with the minimum number
of possible cycles to receive data from the L2 cache in the case
of an L1 load miss. Table 16.13 gives
the most common cases that can result in an instruction replay because
of a memory system stall.
Memory system effects on instruction timings
|Load data miss||8 cycles|
A load instruction
misses in the L1 data cache.
A request is then made to the L2 data cache.
If a miss also occurs in the L2 data cache, then
a second replay occurs. The number of stall cycles depends on the
external system memory timing. The minimum time required to receive
the critical word for an L2 cache miss is approximately 25 cycles,
but can be much longer because of L3 memory latencies.
|Data TLB miss||24 cycles|
A table walk because
of a miss in the L1 TLB causes a 24-cycle delay, assuming the translation
table entries are found in the L2 cache.
If the translation table entries are not present
in the L2 cache, the number of stall cycles depends on the external
system memory timing.
|Store buffer full|
8 cycles plus latency to drain fill buffer
A store instruction
miss does not result in any stalls unless the store buffer is full.
In the case of a full store buffer, the delay is
at least eight cycles. The delay can be more if it takes longer
to drain some entries from the store buffer.
load or store
If a load instruction
address is unaligned and the full access is not contained within
a 128-bit boundary, there is a 8-cycle penalty.
If a store instruction address is unaligned and
the full access is not contained within a 64-bit boundary, there
is a 8-cycle penalty.
Was this page helpful?
Thank you! We appreciate your feedback.