VFMAB, VFMAT (BFloat16, by scalar)
The BFloat16 floating-point widening multiply-add long instruction widens the even-numbered (bottom) or odd-numbered (top) 16-bit elements in the first source vector, and an indexed element in the second source vector from Bfloat16 to single-precision format. The instruction then multiplies and adds these values to the overlapping single-precision elements of the destination vector.
Unlike other BFloat16 multiplication instructions, this performs a fused multiply-add, without intermediate rounding that uses the Round to Nearest rounding mode and can generate a floating-point exception that causes cumulative exception bits in the FPSCR to be set.
It has encodings from the following instruction sets:
A32 (
A1
)
and
T32 (
T1
)
.
A1
(Armv8.6)
31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | D | 1 | 1 | Vn | Vd | 1 | 0 | 0 | 0 | N | Q | M | 1 | Vm |
if !HaveAArch32BF16Ext() then UNDEFINED;
if Vd<0> == '1' || Vn<0> == '1' then UNDEFINED;
integer d = UInt(D:Vd);
integer n = UInt(N:Vn);
integer m = UInt(Vm<2:0>);
integer i = UInt(M:Vm<3>);
integer elements = 128 DIV 32;
integer sel = UInt(Q);
T1
(Armv8.6)
15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | D | 1 | 1 | Vn | Vd | 1 | 0 | 0 | 0 | N | Q | M | 1 | Vm |
if InITBlock() then UNPREDICTABLE;
if !HaveAArch32BF16Ext() then UNDEFINED;
if Vd<0> == '1' || Vn<0> == '1' then UNDEFINED;
integer d = UInt(D:Vd);
integer n = UInt(N:Vn);
integer m = UInt(Vm<2:0>);
integer i = UInt(M:Vm<3>);
integer elements = 128 DIV 32;
integer sel = UInt(Q);
Assembler Symbols
<bt> |
Is the bottom or top element specifier,
encoded in
T :
|
<Qd> |
Is the 128-bit name of the SIMD&FP destination register, encoded in the "D:Vd" field as <Qd>*2.
|
<Qn> |
Is the 128-bit name of the first SIMD&FP source register, encoded in the "N:Vn" field as <Qn>*2.
|
<Dm> |
Is the 64-bit name of the second SIMD&FP source register, encoded in the "Vm<2:0>" field.
|
<index> |
Is the element index in the range 0 to 3, encoded in the "M:Vm<3>" field.
|
Operation
CheckAdvSIMDEnabled();
bits(128) operand1 = Q[n>>1];
bits(64) operand2 = D[m];
bits(128) operand3 = Q[d>>1];
bits(128) result;
bits(32) element2 = Elem[operand2, i, 16] : Zeros(16);
for e = 0 to elements-1
bits(32) element1 = Elem[operand1, 2 * e + sel, 16] : Zeros(16);
bits(32) addend = Elem[operand3, e, 32];
Elem[result, e, 32] = FPMulAdd(addend, element1, element2,
StandardFPSCRValue());
Q[d>>1] = result;