(old) | htmldiff from- | (new) |
BFloat16 floating-point dot product.
The BFloat16 floating-point (BF16) dot product instruction computes the dot product of a pair of BF16 values held in each 32-bit element of the first source vector multiplied by a pair of BF16 values in the corresponding 32-bit element of the second source vector, and then destructively adds the single-precision dot product to the corresponding single-precision element of the destination vector.
This instruction is unpredicated.
All floating-point calculations performed by this instruction are performed with the following behaviors, irrespective of the value in FPCR:
* Uses the non-IEEE 754 Round-to-Odd mode, which forces bit 0 of an inexact result to 1, and rounds an overflow to an appropriately signed Infinity.
* The cumulative FPSR exception bits (IDC, IXC, UFC, OFC, DZC and IOC) are not modified.
* Trapped floating-point exceptions are disabled, as if the FPCR trap enable bits (IDE, IXE, UFE, OFE, DZE and IOE) are all zero.
* Denormalized inputs and results are flushed to zero, as if FPCR.FZ == 1.
* Only the Default NaN is generated, as if FPCR.DN == 1.
ID_AA64ZFR0_EL1.BF16 indicates whether this instruction is implemented.
31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
0 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | Zm | 1 | 0 | 0 | 0 | 0 | 0 | Zn | Zda |
if !HaveSVE() || !HaveBF16Ext() then UNDEFINED; integer n = UInt(Zn); integer m = UInt(Zm); integer da = UInt(Zda);
<Zda> | Is the name of the third source and destination scalable vector register, encoded in the "Zda" field. |
<Zn> | Is the name of the first source scalable vector register, encoded in the "Zn" field. |
<Zm> | Is the name of the second source scalable vector register, encoded in the "Zm" field. |
CheckSVEEnabled(); integer elements = VL DIV 32; bits(VL) operand1 = Z[n]; bits(VL) operand2 = Z[m]; bits(VL) operand3 = Z[da]; bits(VL) result; for e = 0 to elements-1 bits(16) elt1_a = Elem[operand1, 2 * e + 0, 16]; bits(16) elt1_b = Elem[operand1, 2 * e + 1, 16]; bits(16) elt2_a = Elem[operand2, 2 * e + 0, 16]; bits(16) elt2_b = Elem[operand2, 2 * e + 1, 16]; bits(32) sum = BFAdd(BFMul(elt1_a, elt2_a), BFMul(elt1_b, elt2_b)); Elem[result, e, 32] = BFAdd(Elem[operand3, e, 32], sum); Z[da] = result;
This instruction might be immediately preceded in program order by a MOVPRFX instruction.instruction Thethat conforms to all of the following requirements, otherwise the behavior of either or both instructions is MOVPRFXunpredictable instruction must conform to all of the following requirements, otherwise the behavior of the:
Internal version only: isa v31.05bv31.04, AdvSIMD v29.02, pseudocode v2019-12_rc3_1v2019-09_rc2_1, sve v2019-12_rc3v2019-09_rc3
; Build timestamp: 2019-12-13T112019-09-27T17:0228
Copyright © 2010-2019 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
(old) | htmldiff from- | (new) |