SDOT (indexed)
Signed dot product by indexed quadtuplet.
The indexed signed integer partial dot product instruction delimits the source vectors into quadtuplets of four 8-bit or 16-bit signed integer elements. Within each quadtuplet of each 128-bit vector segment the elements in the first source vector are multiplied by the corresponding elements in the specified quadtuplet of the second source vector segment and the resulting widened products are summed and added to the 32-bit or 64-bit element of the accumulator and destination vector which aligns with the quadtuplet in the first source vector.
The quadtuplets within the second source vector are specified using an immediate index which selects the same quadtuplet position within each 128-bit vector segment. The index range is from 0 to one less than the number of quadtuplets per 128-bit segment, encoded in 1 to 2 bits depending on the size of the quadtuplet.
It has encodings from 2 classes: 32-bit and 64-bit
32-bit
31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | i2 | Zm | 0 | 0 | 0 | 0 | 0 | 0 | Zn | Zda |
if !HaveSVE() then UNDEFINED; integer esize = 32; integer index = UInt(i2); integer n = UInt(Zn); integer m = UInt(Zm); integer da = UInt(Zda);
64-bit
31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 1 | i1 | Zm | 0 | 0 | 0 | 0 | 0 | 0 | Zn | Zda |
if !HaveSVE() then UNDEFINED; integer esize = 64; integer index = UInt(i1); integer n = UInt(Zn); integer m = UInt(Zm); integer da = UInt(Zda);
Assembler Symbols
<Zda> |
Is the name of the third source and destination scalable vector register, encoded in the "Zda" field. |
<Zn> |
Is the name of the first source scalable vector register, encoded in the "Zn" field. |
Operation
CheckSVEEnabled(); integer elements = VL DIV esize; integer eltspersegment = 128 DIV esize; bits(VL) operand1 = Z[n]; bits(VL) operand2 = Z[m]; bits(VL) operand3 = Z[da]; bits(VL) result; for e = 0 to elements-1 integer segmentbase = e - e MOD eltspersegment; integer s = segmentbase + index; bits(esize) res = Elem[operand3, e, esize]; for i = 0 to 3 integer element1 = SInt(Elem[operand1, 4 * e + i, esize DIV 4]); integer element2 = SInt(Elem[operand2, 4 * s + i, esize DIV 4]); res = res + element1 * element2; Elem[result, e, esize] = res; Z[da] = result;