You copied the Doc URL to your clipboard.

What is the true interrupt latency of Cortex-M3 and Cortex-M4 for interrupt entry and exit?

Article ID: 103489845

Published date: 04 Apr 2019

Last updated: -

Applies to: Cortex-M3, Cortex-M4

Problem/Question

What is the true interrupt latency of Cortex-M3 and Cortex-M4 for interrupt entry and exit?

Scenario

The interrupt latency for interrupt entry is the number of processor clock cycles between an interrupt signal arriving at the processor, and the processor executing the first instruction of the interrupt handler. Conversely, the interrupt exit latency is the number of processor clock cycles between execution of the interrupt return instruction, and execution of the next instruction in the interrupted execution context.

The Cortex-M4 Technical Reference Manual (TRM) states that the interrupt latency on entry is 12 cycles, plus a possible 17 cycles more for Cortex-M4 with Floating-point Unit (FPU) implemented. The TRM also states that the latency on exit is ten cycles, plus a possible 17 cycles more for Cortex-M4 with FPU.

Cortex-M3 TRM (for example, revision 'I') states that the interrupt latency on entry is 12 cycles and the latency on exit is also 12 cycles. This is a typographical error in the Cortex-M3 TRM. The Cortex-M3 has a latency on exit of ten cycles, just like the Cortex-M4.

Answer

The basic interrupt entry latency of 12 cycles depends on conditions relating to both the chip design and the software programming of the processor.

The processor has three main physical interfaces to its memory system. I-Code and D-Code each accesses addresses below 0x20000000, and System accesses addresses 0x20000000 and higher. The 12-cycle latency requires that a nine-cycle stack push can take place on one interface (typically the System interface). This operation happens in parallel with a six-cycle vector table read and interrupt handler fetch on other interfaces (typically I-Code). If these operations cannot be performed in parallel, they must be performed one after the other, increasing the latency. The ability to perform these memory accesses in parallel depends on the hardware design providing the relevant memory blocks at suitable addresses. , and software programming of the location of the vector table, interrupt handler code, and stack in those memories.

The 12-cycle latency requires that there are no wait-states, either explicit or implicit, in the memory system. The procedure that is described above consists of many memory accesses. Wait-states cause memory accesses to take extra additional cycles, increasing the latency. Explicit wait-states are related to the design of the memory system, and correspond to the memory system indicating "not ready" when the processor tries to access memory. Implicit wait-states are caused by optional features of the processor design and software programming. The processor has an optional "bit-band" feature, in which a programmed store operation is converted into a read-modify-write operation, to change a single bit in memory. Use of this feature adds an implicit wait-state, which is added to the interrupt latency if the interrupt signal coincides with a bit-band write. The processor is also able to perform unaligned memory accesses, in which the address being accessed is not a multiple of the size of the access. Because the bus protocol that is used by the processor does not support this type of access, the access is converted into two or three smaller aligned accesses on the bus. This can add one or two implicit wait-states to the access. In addition, the chip designer may have implemented a feature called CONST_AHB_CTRL (or, in some documents, AHB_CONST_CTRL). This feature can add one further implicit wait-state to unaligned accesses.

Whether the Cortex-M4 with FPU will stack 17 floating-point registers during interrupt entry is software programmable. It is generally preferable to use the "lazy stacking" option. This option defers this additional stacking until it becomes necessary and, in many cases, avoids the need to stack these registers at all.

For both processors, the notional 12-cycle interrupt latency can be reduced in the case of a late-arriving interrupt. This occurs when a higher priority interrupt arrives during the interrupt entry sequence for a lower priority interrupt. Because the processor has already started to push the interrupted context onto the stack, the latency for the new interrupt can be reduced. However, the interrupt latency must still include at least the two cycles required to recognize the new interrupt and the six-cycle vector table read and interrupt handler fetch. This mean that the interrupt latency cannot be less than eight cycles.

So, in a device with no wait-states in the external memory system, and with the stack, vector table, and handler code memory located in ideal memory locations, the interrupt latency may actually be between eight and 15 cycles if the features which have implicit wait-states are used, and between eight and 32 cycles if the floating-point context in a Cortex-M4 with FPU is also stacked immediately.

These calculations do not include any synchronizer external to the processor, which would add further cycles to the effective interrupt latency. Arm recommends that chip designers include an external synchronizer on any interrupt signal arriving from an asynchronous clock domain.

The interrupt exit latency is similarly affected by which bus interfaces are required for the stack pop and the fetch of the interrupted code stream. This may not be the same memory as the interrupt handler code fetch. Interrupt exit latency is also similarly affected by explicit wait-states in the memory system.

Interrupt exit latency is not affected by implicit wait-states. This is because the processor, by definition, will be executing a "return from exception" instruction, rather than a bit-band or unaligned memory access, at the point at which it recognizes the return from the interrupt handler.

In a Cortex-M4 with FPU, if the floating-point registers were stacked on exception entry, or if they were stacked later via lazy stacking, the exception return will include the additional 17 cycles. This means that the additional 17 cycles will occur more often on interrupt exit than on interrupt entry, if lazy stacking is used.

Therefore, in a device with no wait-states in the external memory system, and with the stack, vector table, and the code for the interrupted context located in ideal memory locations, the interrupt exit latency can be ten cycles. For Cortex-M4 with FPU and with an interrupted floating-point context in the stack frame, the interrupt exit latency can be 27 cycles.

In a memory system with explicit wait-states (meaning that the interconnect or slaves can add stall cycles to the data phase of a transfer), an interrupt may be recognized during the wait-states in the data phase of a memory access. This data phase must complete before the interrupt entry sequence can begin, extending the interrupt entry latency. Additionally, according to the AMBA 3 AHB-Lite protocol, if a further address phase has been indicated on the bus for the next memory access in program order, this additional memory access must also complete before the interrupt can be taken. The chip designer can configure the processor to respect this protocol rule, using the CONST_AHB_CTRL feature, or to ignore this protocol rule by deselecting CONST_AHB_CTRL.

So in a memory system with wait-states, calculation of the worst-case additional interrupt latency depends on:

  • wait-states in the data phase of a current memory access

  • a possible additional set of wait-states for a further memory access if CONST_AHB_CTRL is used

  • wait-states on the multiple memory accesses cause by bit-band or unaligned data accesses

  • wait-states on the stack push of eight or possibly twenty-five registers

  • wait states on the vector table read and handler code fetch.

There is also a capability for software to disable interruption of multi-cycle instructions (ACTLR.DISMCYCINT), which could further extend the number of memory accesses that must be completed before the interrupt entry procedure starts.

Workaround

No workaround.

Example

No example.

Related Information

None.

Was this page helpful? Yes No