Load and store instructions are classed as:
single load and store instructions such as
load and store multiple instructions such as
For load multiple and store multiple instructions, the number of registers in the register list usually determines the number of cycles required to execute a load or store instruction.
The Cortex-A9 processor has an optimized path from a load instruction to a subsequent data processing instruction, saving 1 cycle on the load-use penalty.
This path is used when the following conditions are met:
the data-processing instruction is an arithmetical, a logical or a saturation operation
the data-processing instruction does not require any shift
the load instruction does not require sign extension
the load instruction is not conditional.
Table B.2 shows cycle timing for single load and store operations. The result latency is the latency of the first loaded register.
|Instruction cycles||AGU cycles||Result latency|
|Fast forward cases||other cases|
The Cortex-A9 processor can load or store two 32-bit registers in each cycle. However, to access 64 bits, the address must be 64-bit aligned.
This scheduling is done in the Address Generation Unit (AGU). The number of cycles required by the AGU to process the load multiple or store multiple operations depends on the length of the register list and the 64-bit alignment of the address. The resulting latency is the latency of the first loaded register. Table B.3 shows the cycle timings for load multiple operations.
|Instruction||AGU cycles to process the instruction||Resulting latency|
|Address aligned on a 64-bit boundary||Fast forward case||Other cases|
Table B.4 shows the cycle timings of store multiple operations.
|Aligned on a 64-bit boundary|