The Normal memory type is used for anything that behaves like a memory, including RAM, Flash, or ROM. Code should only be placed in locations marked as Normal.
Normal is usually the most common memory type in a system, as shown in this diagram:
Traditionally, computer processors execute instructions in the order that they were specified in the program. Things happen the number of times specified in the program and they happen one at a time. This is called the Simple Sequential Execution (SSE) model. Most modern processors may appear to follow this model, but in reality a number of optimizations are both applied and made available to you, to help speed up performance. We will introduce some of these optimizations here.
A location that is marked as Normal has no direct side-effects when it is accessed. This means that reading the location just returns us the data, but does not cause the data to change or directly trigger another process. Because of this, for locations marked as Normal a processor may:
- Merge accesses. Code can access a location multiple times, or access multiple consecutive locations. For efficiency, the processor is permitted to detect and merge these accesses into a single access. For example, if software writes to a variable multiple times, the processor might only present the last write to the memory system.
- Perform accesses speculatively. The processor is permitted to read a location marked as Normal without it being specifically requested by software. For example, the processor might use pattern recognition to prefetch data before software has requested it, based on the patterns of previous accesses. This technique is used to expedite accesses by predicting behavior.
- Re-order accesses. The order that accesses are seen in the memory system might not match the order that the accesses were issued in by software. For example, a processor might re-order two reads to allow it to generate a more efficient bus access. Accesses to the same location cannot be re-ordered but might be merged.
Think about these optimizations like freedoms that allow the processor to employ techniques to speed up performance and improve power efficiency. This means that the Normal memory type usually gives the best performance.
Note: The processor is permitted to optimize in these ways, but that does not mean it always will. How much use a given processor will make of these freedoms depends on its micro-architecture. From a software perspective, you should assume that the processor might do any or all of them.
Limits on re-ordering
To recap, accesses to locations marked as Normal can be re-ordered. Let's consider this example code with a sequence of three memory accesses, two stores and then a load:
If the processor were to re-order these accesses, this might result in the wrong value in memory, which is not allowed.
For accesses to the same bytes, ordering must be maintained. The processor needs to detect the hazard and ensure that the accesses are ordered correctly for the intended outcome.
This does not mean that there is no possibility of optimization with this example. The processor could merge the two stores together, presenting a single combined store to the memory system. It could also detect that the load operation is from the bytes written by the store instructions so that it could return the new value without re-reading it from memory.
Note: The sequence given in the example is deliberately contrived to make the point. In practice, these kinds of hazard tend to be more subtle.
There are other cases in which ordering is enforced, for example Address dependencies. An Address dependency is when a load or store uses the result of a previous load as an address. In this code example, the second instruction is dependent on the outcome of the first instruction:
LDR X0,[X1] STR X2,[X0] // The result of the previous load is the address in this store.
This example also shows an Address dependency in which the second instruction is dependent on the outcome of the first instruction:
LDR X0,[X1] STR X2,[X5, X0] // The result of the previous load is used to calculate the address.
Where there is an Address dependency between two memory accesses, the processor must maintain the order.
This rule does not apply to control dependencies. A control dependency is when the value from a previous load is used to make a decision. This code example shows a load followed by a Compare and Branch on Zero operation that relies on the value from the load:
LDRX0, [X1] CBZ X0, <somewhere_else> STRX2, [X5][Symbol] // There is a control dependency on X0, this does not guarantee ordering.
There are cases in which ordering needs to be enforced between accesses to Normal memory, or accesses to Normal and Device memory. This can be achieved using barrier instructions.