You copied the Doc URL to your clipboard.

UDOT (indexed)

Unsigned dot product by indexed quadtuplet.

The indexed unsigned integer partial dot product instruction delimits the source vectors into quadtuplets of four 8-bit or 16-bit unsigned integer elements. Within each quadtuplet of each 128-bit vector segment the elements in the first source vector are multiplied by the corresponding elements in the specified quadtuplet of the second source vector segment and the resulting widened products are summed and added to the 32-bit or 64-bit element of the accumulator and destination vector which aligns with the quadtuplet in the first source vector.

The quadtuplets within the second source vector are specified using an immediate index which selects the same quadtuplet position within each 128-bit vector segment. The index range is from 0 to one less than the number of quadtuplets per 128-bit segment, encoded in 1 to 2 bits depending on the size of the quadtuplet.

It has encodings from 2 classes: 32-bit and 64-bit

32-bit

313029282726252423222120191817161514131211109876543210
01000100101i2Zm000001ZnZda

32-bit

UDOT <Zda>.S, <Zn>.B, <Zm>.B[<imm>]

if !HaveSVE() then UNDEFINED;
integer esize = 32;
integer index = UInt(i2);
integer n = UInt(Zn);
integer m = UInt(Zm);
integer da = UInt(Zda);

64-bit

313029282726252423222120191817161514131211109876543210
01000100111i1Zm000001ZnZda

64-bit

UDOT <Zda>.D, <Zn>.H, <Zm>.H[<imm>]

if !HaveSVE() then UNDEFINED;
integer esize = 64;
integer index = UInt(i1);
integer n = UInt(Zn);
integer m = UInt(Zm);
integer da = UInt(Zda);

Assembler Symbols

<Zda>

Is the name of the third source and destination scalable vector register, encoded in the "Zda" field.

<Zn>

Is the name of the first source scalable vector register, encoded in the "Zn" field.

<Zm>

For the 32-bit variant: is the name of the second source scalable vector register Z0-Z7, encoded in the "Zm" field.

For the 64-bit variant: is the name of the second source scalable vector register Z0-Z15, encoded in the "Zm" field.

<imm>

For the 32-bit variant: is the immediate index of a quadtuplet of four 8-bit elements within each 128-bit vector segment, in the range 0 to 3, encoded in the "i2" field.

For the 64-bit variant: is the immediate index of a quadtuplet of four 16-bit elements within each 128-bit vector segment, in the range 0 to 1, encoded in the "i1" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
integer eltspersegment = 128 DIV esize;
bits(VL) operand1 = Z[n];
bits(VL) operand2 = Z[m];
bits(VL) operand3 = Z[da];
bits(VL) result;

for e = 0 to elements-1
    integer segmentbase = e - e MOD eltspersegment;
    integer s = segmentbase + index;
    bits(esize) res = Elem[operand3, e, esize];
    for i = 0 to 3
        integer element1 = UInt(Elem[operand1, 4 * e + i, esize DIV 4]);
        integer element2 = UInt(Elem[operand2, 4 * s + i, esize DIV 4]);
        res = res + element1 * element2;
    Elem[result, e, esize] = res;

Z[da] = result;