ARMv6 architecture introduced a small set of SIMD instructions, operating on multiple 16-bit or 8-bit values packed into standard 32-bit general purpose registers. This permits certain operations to execute twice or four times as quickly, without implementing additional computation units. The mnemonics for these instructions are recognized by having 8 or 16 appended to the base form, indicating the size of data values operated on.
Figure 1.1 shows
the operation of the
UADD8 R0, R1, R2 instruction.
This operation performs a parallel addition of four lanes of
8-bit elements packed into vectors stored in
general purpose registers R1 and R2, and places the result into
a vector in register R0.