You copied the Doc URL to your clipboard.

VMLA (floating-point)

Vector Multiply Accumulate multiplies corresponding elements in two vectors, and accumulates the results into the elements of the destination vector.

Depending on settings in the CPACR, NSACR, HCPTR, and FPEXC registers, and the Security state and PE mode in which the instruction is executed, an attempt to execute the instruction might be undefined, or trapped to Hyp mode. For more information see Enabling Advanced SIMD and floating-point support.

It has encodings from the following instruction sets: A32 ( A1 and A2 ) and T32 ( T1 and T2 ) .

A1

313029282726252423222120191817161514131211109876543210
111100100D0szVnVd1101NQM1Vm
op

64-bit SIMD vector (Q == 0)

VMLA{<c>}{<q>}.<dt> <Dd>, <Dn>, <Dm>

128-bit SIMD vector (Q == 1)

VMLA{<c>}{<q>}.<dt> <Qd>, <Qn>, <Qm>

if Q == '1' && (Vd<0> == '1' || Vn<0> == '1' || Vm<0> == '1') then UNDEFINED;
if sz == '1' && !HaveFP16Ext() then UNDEFINED;
advsimd = TRUE;  add = (op == '0');
case sz of
    when '0' esize = 32; elements = 2;
    when '1' esize = 16; elements = 4;
d = UInt(D:Vd);  n = UInt(N:Vn);  m = UInt(M:Vm);  regs = if Q == '0' then 1 else 2;

A2

313029282726252423222120191817161514131211109876543210
!= 111111100D00VnVd10sizeN0M0Vm
condop

Half-precision scalar (size == 01)
(Armv8.2)

VMLA{<c>}{<q>}.F16 <Sd>, <Sn>, <Sm>

Single-precision scalar (size == 10)

VMLA{<c>}{<q>}.F32 <Sd>, <Sn>, <Sm>

Double-precision scalar (size == 11)

VMLA{<c>}{<q>}.F64 <Dd>, <Dn>, <Dm>

if size == '00' || (size == '01' && !HaveFP16Ext()) then UNDEFINED;
if size == '01' && cond != '1110' then UNPREDICTABLE;
if FPSCR.Len != '000' || FPSCR.Stride != '00' then UNDEFINED;
advsimd = FALSE; add = (op == '0');
case size of
    when '01' esize = 16; d = UInt(Vd:D); n = UInt(Vn:N); m = UInt(Vm:M);
    when '10' esize = 32; d = UInt(Vd:D); n = UInt(Vn:N); m = UInt(Vm:M);
    when '11' esize = 64; d = UInt(D:Vd); n = UInt(N:Vn); m = UInt(M:Vm);

CONSTRAINED UNPREDICTABLE behavior

If size == '01' && cond != '1110', then one of the following behaviors must occur:

  • The instruction is undefined.
  • The instruction executes as if it passes the Condition code check.
  • The instruction executes as NOP. This means it behaves as if it fails the Condition code check.

T1

15141312111098765432101514131211109876543210
111011110D0szVnVd1101NQM1Vm
op

64-bit SIMD vector (Q == 0)

VMLA{<c>}{<q>}.<dt> <Dd>, <Dn>, <Dm>

128-bit SIMD vector (Q == 1)

VMLA{<c>}{<q>}.<dt> <Qd>, <Qn>, <Qm>

if Q == '1' && (Vd<0> == '1' || Vn<0> == '1' || Vm<0> == '1') then UNDEFINED;
if sz == '1' && !HaveFP16Ext() then UNDEFINED;
if sz == '1' && InITBlock() then UNPREDICTABLE;
advsimd = TRUE;  add = (op == '0');
case sz of
    when '0' esize = 32; elements = 2;
    when '1' esize = 16; elements = 4;
d = UInt(D:Vd);  n = UInt(N:Vn);  m = UInt(M:Vm);  regs = if Q == '0' then 1 else 2;

CONSTRAINED UNPREDICTABLE behavior

If sz == '1' && InITBlock(), then one of the following behaviors must occur:

  • The instruction is undefined.
  • The instruction executes as if it passes the Condition code check.
  • The instruction executes as NOP. This means it behaves as if it fails the Condition code check.

T2

15141312111098765432101514131211109876543210
111011100D00VnVd10sizeN0M0Vm
op

Half-precision scalar (size == 01)
(Armv8.2)

VMLA{<c>}{<q>}.F16 <Sd>, <Sn>, <Sm>

Single-precision scalar (size == 10)

VMLA{<c>}{<q>}.F32 <Sd>, <Sn>, <Sm>

Double-precision scalar (size == 11)

VMLA{<c>}{<q>}.F64 <Dd>, <Dn>, <Dm>

if size == '00' || (size == '01' && !HaveFP16Ext()) then UNDEFINED;
if size == '01' && InITBlock()  then UNPREDICTABLE;
if FPSCR.Len != '000' || FPSCR.Stride != '00' then UNDEFINED;
advsimd = FALSE; add = (op == '0');
case size of
    when '01' esize = 16; d = UInt(Vd:D); n = UInt(Vn:N); m = UInt(Vm:M);
    when '10' esize = 32; d = UInt(Vd:D); n = UInt(Vn:N); m = UInt(Vm:M);
    when '11' esize = 64; d = UInt(D:Vd); n = UInt(N:Vn); m = UInt(M:Vm);

CONSTRAINED UNPREDICTABLE behavior

If size == '01' && InITBlock(), then one of the following behaviors must occur:

  • The instruction is undefined.
  • The instruction executes as if it passes the Condition code check.
  • The instruction executes as NOP. This means it behaves as if it fails the Condition code check.

Assembler Symbols

<c>

For encoding A1: see Standard assembler syntax fields. This encoding must be unconditional.

For encoding A2, T1 and T2: see Standard assembler syntax fields.

<q>

See Standard assembler syntax fields.

<dt> Is the data type for the elements of the vectors, encoded in sz:
sz <dt>
0 F32
1 F16
<Qd>

Is the 128-bit name of the SIMD&FP destination register, encoded in the "D:Vd" field as <Qd>*2.

<Qn>

Is the 128-bit name of the first SIMD&FP source register, encoded in the "N:Vn" field as <Qn>*2.

<Qm>

Is the 128-bit name of the second SIMD&FP source register, encoded in the "M:Vm" field as <Qm>*2.

<Dd>

Is the 64-bit name of the SIMD&FP destination register, encoded in the "D:Vd" field.

<Dn>

Is the 64-bit name of the first SIMD&FP source register, encoded in the "N:Vn" field.

<Dm>

Is the 64-bit name of the second SIMD&FP source register, encoded in the "M:Vm" field.

<Sd>

Is the 32-bit name of the SIMD&FP destination register, encoded in the "Vd:D" field.

<Sn>

Is the 32-bit name of the first SIMD&FP source register, encoded in the "Vn:N" field.

<Sm>

Is the 32-bit name of the second SIMD&FP source register, encoded in the "Vm:M" field.

Operation

if ConditionPassed() then
    EncodingSpecificOperations();  CheckAdvSIMDOrVFPEnabled(TRUE, advsimd);
    if advsimd then  // Advanced SIMD instruction
        for r = 0 to regs-1
            for e = 0 to elements-1
                product = FPMul(Elem[D[n+r],e,esize], Elem[D[m+r],e,esize], StandardFPSCRValue());
                addend = if add then product else FPNeg(product);
                Elem[D[d+r],e,esize] = FPAdd(Elem[D[d+r],e,esize], addend, StandardFPSCRValue());
    else             // VFP instruction
        case esize of
            when 16
                addend16 = if add then FPMul(S[n]<15:0>, S[m]<15:0>, FPSCR) else FPNeg(FPMul(S[n]<15:0>, S[m]<15:0>, FPSCR));
                S[d] = Zeros(16) : FPAdd(S[d]<15:0>, addend16, FPSCR);
            when 32
                addend32 = if add then FPMul(S[n], S[m], FPSCR) else FPNeg(FPMul(S[n], S[m], FPSCR));
                S[d] = FPAdd(S[d], addend32, FPSCR);
            when 64
                addend64 = if add then FPMul(D[n], D[m], FPSCR) else FPNeg(FPMul(D[n], D[m], FPSCR));
                D[d] = FPAdd(D[d], addend64, FPSCR);