Intrinsics – Arm Developer

SIMD ISA	Return Type	Name	Arguments	Instruction Group
Neon	`float32x4_t`	`vbfmlaltq_f32`	`(float32x4_t r, bfloat16x8_t a, bfloat16x8_t b)`	Vector arithmetic / Multiply / Multiply-accumulate
Description BFloat16 floating-point widening multiply-add long (vector) widens the even-numbered (bottom) or odd-numbered (top) 16-bit elements in the first and second source vectors from Bfloat16 to single-precision format. The instruction then multiplies and adds these values to the overlapping single-precision elements of the destination vector. Results Vd.4S result This intrinsic compiles to the following instructions: BFMLALT `Vd.4S,Vn.8H,Vm.8H` Argument Preparation r register: Vd.4Sa register: Vn.8Hb register: Vm.8H Architectures A32, A64 Operation `CheckFPAdvSIMDEnabled64(); bits(128) operand1 = V[n]; bits(128) operand2 = V[m]; bits(128) operand3 = V[d]; bits(128) result; for e = 0 to elements-1 bits(32) element1 = Elem[operand1, 2 * e + sel, 16] : Zeros(16); bits(32) element2 = Elem[operand2, 2 * e + sel, 16] : Zeros(16); bits(32) addend = Elem[operand3, e, 32]; Elem[result, e, 32] = BFMulAdd(addend, element1, element2, FPCR[]); V[d] = result;`

vbfmlaltq_f32