You copied the Doc URL to your clipboard.

B.3. Load and store instructions

Load and store instructions are classed as:

  • single load and store instructions such as LDR instructions

  • load and store multiple instructions such as LDM instructions.

For load multiple and store multiple instructions, the number of registers in the register list usually determines the number of cycles required to execute a load or store instruction.

The Cortex-A9 processor has an optimized path from a load instruction to a subsequent data processing instruction, saving 1 cycle on the load-use penalty.

This path is used when the following conditions are met:

  • the data-processing instruction is an arithmetical, a logical or a saturation operation

  • the data-processing instruction does not require any shift

  • the load instruction does not require sign extension

  • the load instruction is not conditional.

Table B.2 shows cycle timing for single load and store operations. The result latency is the latency of the first loaded register.

Table B.2. Single load and store operation cycle timings
Instruction cyclesAGU cyclesResult latency
Fast forward casesother cases

LDR ,[reg]

LDR ,[reg imm]

LDR ,[reg reg]

LDR ,[reg reg LSL #2]


LDR ,[reg reg LSL reg]

LDR ,[reg reg LSR reg]

LDR ,[reg reg ASR reg]

LDR ,[reg reg ROR reg]

LDR ,[reg reg, RRX]


LDRB ,[reg]

LDRB ,[reg imm]

LDRB ,[reg reg]

LDRB ,[reg reg LSL #2]

LDRH ,[reg]

LDRH ,[reg imm]

LDRH ,[reg reg]

LDRH ,[reg reg LSL #2]


LDRB ,[reg reg LSL reg]

LDRB ,[reg reg ASR reg]

LDRB ,[reg reg LSL reg]

LDRB ,[reg reg ASR reg]

LDRH ,[reg reg LSL reg]

LDRH ,[reg reg ASR reg]

LDRH ,[reg reg LSL reg]

LDRH ,[reg reg ASR reg]


The Cortex-A9 processor can load or store two 32-bit registers in each cycle. However, to access 64 bits, the address must be 64-bit aligned.

This scheduling is done in the Address Generation Unit (AGU). The number of cycles required by the AGU to process the load multiple or store multiple operations depends on the length of the register list and the 64-bit alignment of the address. The resulting latency is the latency of the first loaded register. Table B.3 shows the cycle timings for load multiple operations.

Table B.3. Load multiple operations cycle timings
InstructionAGU cycles to process the instruction Resulting latency
Address aligned on a 64-bit boundaryFast forward caseOther cases
LDM ,{1 register}1123

LDM ,{2 registers}



LDM ,{3 registers}2223
LDM ,{4 registers}2323
LDM ,{5 registers}3323
LDM ,{6 registers}3423
LDM ,{7 registers}4423
LDM ,{8 registers}4523
LDM ,{9 registers}5523
LDM ,{10 registers}5623
LDM ,{11 registers}6623
LDM ,{12 registers}6723
LDM ,{13 registers}7723
LDM ,{14 registers}7823
LDM ,{15 registers}8823
LDM ,{16 registers}8923

Table B.4 shows the cycle timings of store multiple operations.

Table B.4. Store multiple operations cycle timings
InstructionAGU cycles
Aligned on a 64-bit boundary
STM ,{1 register}11

STM ,{2 registers}



STM ,{3 registers}22
STM ,{4 registers}23
STM ,{5 registers}33
STM ,{6 registers}34
STM ,{7 registers}44
STM ,{8 registers}45
STM ,{9 registers}55
STM ,{10 registers}56
STM ,{11 registers}66
STM ,{12 registers}67
STM ,{13 registers}77
STM ,{14 registers}78
STM ,{15 registers}88
STM ,{16 registers}89

Was this page helpful? Yes No