Helium instructions

This section provides a brief overview of some of the instructions in Helium.

Helium registers discussed the different lane configurations that can be used in Helium registers. Helium instructions cannot execute correctly unless they are aware of the configuration of the data within the vector register. This data is encoded within the instruction itself. Helium instructions provides 20 scalar instructions and 130 vector instructions. 

Helium instruction naming rules

The names of Helium instructions follow a common pattern. This pattern lets you determine information about the instruction operation from its name.

The following syntax shows the common pattern:

Note: Some of the letters are overloaded, but their position makes the meaning unambiguous.

V[<mod>]<op>[<shape>][<extra>][<cond>][<.dt>] [<dest>, <src>...]
Short for vector. Present on all the assembly instructions
Short for modifier, for example None, Q (sat), H (halving), and D (doubling)
Short for operation, for example, ADD, MUL, MIN
Indicates an extension, for example, None, L (long), or N (narrow)
Indicates an instruction-specific modifier, for example, None, T (top), B (bottom), V (across)
Short for conditional which is used for predication in Helium
Short for data type, for example, Float, Integer, Signed, Unsigned
Indicates the destination of the vector registers, which is where the results get put
Indicates the source vector register(s), which is where the input comes from

The common pattern can be seen with examples for the MOV instruction which you can see here:

  • VMOVN<T><v>.<dt> Qd, Qm
    • In this example, the shape is N, which performs a narrowing to half-width of each data element before it is written to the destination register.
  • VQMOVN<T><v>.<dt> Qd, Qm
    • In this example, the mod is Q, which Performs an element-wise saturation to half-width.
Helium instruction set

Helium includes a variety of instructions and each of these instructions perform different operations, for example math operations like add, multiply, and subtract. Some of these instructions are particularly suited for DSP or ML, for example the intrinsic vmladavaq. vmladavaq is where there are two vectors and the corresponding lanes in each of these vectors are multiplied together. The result from the multiplication is added all together and the scalar value is added to this value. These math operations may all use matrix operations. Matrix operations are helpful for DSP. This is because these operations allow DSP chips to digitize sounds or images that are stored or transmitted electronically.

Different types of load and store

Helium provides three different types of load and store instructions:

Contiguous load
This is the most straightforward way to load data into vectors. With continuous load, each lane or memory is accessed in sequence, starting from a base address that is specified in a scalar register. For example, VLDRB (Vector Load Register Byte), loads consecutive bytes from memory into a destination vector register.
Widening or narrowing
If data that is stored in memory is a different size to the vector lanes, then widening or narrowing is required. This is usually required for packing or unpacking input or output data. An example is the instruction VLDRB.U32 (Vector Load Register Byte). This instruction loads consecutive bytes from memory and will zero-extend the byte to 32 bits and then place it into the corresponding lane in the destination vector.
Scatter or Gather
Scatter or Gather involves gathering data from non-contiguous locations. For example, when manipulating one color channel of RGB data, you gather data items from every third memory location and load them into a vector register.
Generators are the viddup, vddup, viwddup, and vdwdup instructions. These instructions generate incrementing or decrementing index sequences intended to be passed to gather or scatter load and stores. For example:
// 16-bit vector gather load with decrementing offsets with a step of 2
mov            r0, #20                // decrementing sequence start
vddup.u16      q0, r0, #2             // Generator, decrement step of 2
 					// q0 = [  20 18  16 14  12 10  8 6  ]
vldrh.s16      q1, [r1, q0, uxtw #1]  // gather load, base = r1, offset = q0
Different types of math operations
Math Operation Description
Basic arithmetic Integers are used in arithmetic such as addition and subtraction. This is used in many of the predication examples that will be explained in Predication.
Complex arithmetic Instructions like add or multiply (VCMLA) operate on real and imaginary components in the same vector. This might be used in, for example, Fast Fourier transforms (FFTs).
MAC operators Operands are multiplied and accumulated. We explained one of these instructions, VMLA, in Vector instruction example. This might be used in, for example, matrix multiplication.
Previous Next