LD1W (scalar plus immediate)
Contiguous load unsigned words to vector (immediate index).
Contiguous load of unsigned words to elements of a vector register from the memory address generated by a 64-bit scalar base and immediate index in the range -8 to 7 which is multiplied by the vector's in-memory size, irrespective of predication, and added to the base address. Inactive elements will not not cause a read from Device memory or signal a fault, and are set to zero in the destination vector.
It has encodings from 2 classes: 32-bit element and 64-bit element
32-bit element
31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
1 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | imm4 | 1 | 0 | 1 | Pg | Rn | Zt | |||||||||||||
dtype<3:1> | dtype<0> |
if !HaveSVE() then UNDEFINED; integer t = UInt(Zt); integer n = UInt(Rn); integer g = UInt(Pg); integer esize = 32; integer msize = 32; boolean unsigned = TRUE; integer offset = SInt(imm4);
64-bit element
31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
1 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | imm4 | 1 | 0 | 1 | Pg | Rn | Zt | |||||||||||||
dtype<3:1> | dtype<0> |
if !HaveSVE() then UNDEFINED; integer t = UInt(Zt); integer n = UInt(Rn); integer g = UInt(Pg); integer esize = 64; integer msize = 32; boolean unsigned = TRUE; integer offset = SInt(imm4);
Assembler Symbols
<Zt> |
Is the name of the scalable vector register to be transferred, encoded in the "Zt" field. |
<Pg> |
Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field. |
<Xn|SP> |
Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field. |
<imm> |
Is the optional signed immediate vector offset, in the range -8 to 7, defaulting to 0, encoded in the "imm4" field. |
Operation
CheckSVEEnabled(); integer elements = VL DIV esize; bits(64) base; bits(64) addr; bits(PL) mask = P[g]; bits(VL) result; bits(msize) data; constant integer mbytes = msize DIV 8; if n == 31 then if LastActiveElement(mask, esize) >= 0 || ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then CheckSPAlignment(); if HaveMTEExt() then SetTagCheckedInstruction(FALSE); base = SP[]; else if HaveMTEExt() then SetTagCheckedInstruction(TRUE); base = X[n]; addr = base + offset * elements * mbytes; for e = 0 to elements-1 if ElemP[mask, e, esize] == '1' then data = Mem[addr, mbytes, AccType_NORMAL]; Elem[result, e, esize] = Extend(data, esize, unsigned); else Elem[result, e, esize] = Zeros(); addr = addr + mbytes; Z[t] = result;