vbfmmlaq_f32
SIMD ISA | Return Type | Name | Arguments | Instruction Group | |
---|---|---|---|---|---|
Neon | float32x4_t | vbfmmlaq_f32 | (float32x4_t r, bfloat16x8_t a, bfloat16x8_t b) | Vector arithmetic / Matrix multiply | |
Description BFloat16 floating-point matrix multiply-accumulate into 2x2 matrix. This instruction multiplies the 2x4 matrix of BF16 values held in the first 128-bit source vector by the 4x2 BF16 matrix in the second 128-bit source vector. The resulting 2x2 single-precision matrix product is then added destructively to the 2x2 single-precision matrix in the 128-bit destination vector. This is equivalent to performing a 4-way dot product per destination element. The instruction ignores the FPCR and does not update the FPSR exception status. Results Vd.4S result This intrinsic compiles to the following instructions: BFMMLA Argument Preparation r register: Vd.4Sa register: Vn.8Hb register: Vm.8H Architectures A32, A64 Operation
|
Copyright © 1995-2025 Arm Limited (or its affiliates). All rights reserved.