Data processing methodologies

When processing large sets of data, a major factor that limits performance is the amount of CPU time that is taken to perform data processing instructions. This CPU time depends on the number of instructions it takes to deal with the entire data set. The number of instructions depends on how many items of data each instruction can process.

Most Arm instructions are Single Instruction Single Data (SISD). Each instruction performs one operation and writes to one output data stream. Processing multiple items requires multiple instructions.

For example, to perform four separate addition operations using traditional SISD instructions would require four instructions to add values from four pairs of registers:

ADD x0, x0, x5
ADD x1, x1, x6
ADD x2, x2, x7
ADD x3, x3, x8

Single Instruction Multiple Data (SIMD) instructions perform the same operation simultaneously for multiple items. These items are packed as separate elements in a larger register.

The following diagram shows how vector registers V8 and V9 each contain four data elements. The addition operation performs the calculation on all four lanes simultaneously, then places the results in register V10:

The following example instruction adds four pairs of single-precision (32-bit) values together. However, in this case, the values are packed as separate lanes in one pair of 128-bit registers. Each lane in the first source register is then added to the corresponding lane in the second source register, before being stored in the destination register:

ADD Q8.4S, Q8.4S, Q9.4S

Performing the four operations with a single SIMD instruction is more efficient than with four separate SISD instructions.

Previous Next