SIMD ISAReturn TypeNameArgumentsInstruction Group
Neonfloat32x4_tvbfmlaltq_f32(float32x4_t r, bfloat16x8_t a, bfloat16x8_t b)Vector arithmetic / Multiply / Multiply-accumulate
Description
BFloat16 floating-point widening multiply-add long (vector) widens the even-numbered (bottom) or odd-numbered (top) 16-bit elements in the first and second source vectors from Bfloat16 to single-precision format. The instruction then multiplies and adds these values to the overlapping single-precision elements of the destination vector.
Results
Vd.4S result
This intrinsic compiles to the following instructions:

BFMLALT Vd.4S,Vn.8H,Vm.8H

Argument Preparation
r register: Vd.4Sa register: Vn.8Hb register: Vm.8H
Architectures
A32, A64

Operation

CheckFPAdvSIMDEnabled64();
bits(128) operand1 = V[n];
bits(128) operand2 = V[m];
bits(128) operand3 = V[d];
bits(128) result;

for e = 0 to elements-1
    bits(32) element1 = Elem[operand1, 2 * e + sel, 16] : Zeros(16);
    bits(32) element2 = Elem[operand2, 2 * e + sel, 16] : Zeros(16);
    bits(32) addend   = Elem[operand3, e, 32];
    Elem[result, e, 32] = BFMulAdd(addend, element1, element2, FPCR[]);

V[d] = result;