You copied the Doc URL to your clipboard.
16.4.2. Memory system effects on instruction timings
Because the processor is a statically scheduled design, any
stall from the memory system can result in the minimum of a 8-cycle
delay. This 8-cycle delay minimum is balanced with the minimum number
of possible cycles to receive data from the L2 cache in the case
of an L1 load miss. Table 16.16 gives
the most common cases that can result in an instruction replay because
of a memory system stall.
Table 16.16. Memory system effects on instruction timings
|Load data miss||8 cycles|
A load instruction
misses in the L1 data cache.
A request is then made to the L2 data cache.
If a miss also occurs in the L2 data cache, then
a second replay occurs. The number of stall cycles depends on the
external system memory timing. The time required to receive the
critical word for an L2 cache miss is 18 core cycles plus the number
of cycles required by the external memory system. The minimum number
of additional cycles required for the external system is 2 cycles,
making the total minimum cycle count 20 cycles. However, 20 cycles
are likely to be optimistic because this can only occur in a system
with a 1:1 bus ratio and zero wait-state memory.
|Data TLB miss||24 cycles|
A table walk because
of a miss in the L1 TLB causes a 24-cycle delay, assuming the translation
table entries are found in the L2 cache.
If the translation table entries are not present
in the L2 cache, the number of stall cycles depends on the external
system memory timing.
|Store buffer full|
8 cycles plus latency to drain fill buffer
A store instruction
miss does not result in any stalls unless the store buffer is full.
In the case of a full store buffer, the delay is
at least eight cycles. The delay can be more if it takes longer
to drain some entries from the store buffer.
load or store
If a load instruction
address is unaligned and the full access is not contained within
a 128-bit boundary, there is a 8-cycle penalty.
If a store instruction address is unaligned and
the full access is not contained within a 64-bit boundary, there
is a 8-cycle penalty.