Data transfers are managed by the LSU on the core side, and the pipeline itself on the coprocessor side. Transfers can be a single value or a vector. In the latter case, the coprocessor effectively converts a multiple transfer into a series of single transfers by iterating the instruction in the Issue stage. This creates an instance of the load or store instruction for each item to be transferred.
The instruction stays in the coprocessor Issue stage while it iterates, creating copies of itself that move down the pipeline. Figure 11.9 illustrates this process for a load instruction.
The first of the iterated instructions, shown in uppercase, is the head and the others (shown in lowercase) are the tails. In the example shown the vector length is four so there is one head and three tails. At the first iteration of the instruction, the tail flag is set so that subsequent iterations send tail instructions down the pipeline. In the example shown in Figure 11.9, instruction B has stalled in the Ex1 stage (which might be caused by the cancel queue being empty), so that instruction C does not iterate during its first cycle in the Issue stage, but only starts to iterate after the stall has been removed.
Figure 11.8 shows the extra paths required for passing data to and from the coprocessor.
Two data paths are required:
One passes store data from the coprocessor to the core, and this requires a queue, which is maintained by the core.
The other passes load data from the core to the coprocessor and requires no queue, only two pipeline registers.
Figure 11.9 shows instruction iteration for loads.
Only the head instruction is involved in token exchange with the core pipeline, which does not iterate instructions in this way, the tail instructions passing down the pipeline silently.
When an iterated load or store instruction is cancelled or flushed, all the tail instructions (bearing the same tag) must be removed from the pipeline. Only the head instruction becomes a phantom when cancelled. Any tail instruction can be left intact in the pipeline because it has no further effect.
Because the cancel token is received in the coprocessor Ex1 stage, a cancelled iterated instruction always consists of a head instruction in Ex1 and a single tail instruction in the Issue stage.