Neon Intrinsics

Neon intrinsics are function calls that the compiler replaces with an appropriate Neon instruction or sequence of Neon instructions. Intrinsics provide almost as much control as writing assembly language, but leave the allocation of registers to the compiler, so that developers can focus on the algorithms. It can also perform instruction scheduling to remove pipeline stalls for the specified target processor. This leads to more maintainable source code than using assembly language. Neon Intrinsics is supported by Arm Compilers, gcc and LLVM. The Neon Programmer's Guide for Armv8-A provides more information about intrinsics and Neon programming in general.

Here are two introduction guides on using Neon Intrinsics with Android:

Click on the intrinsic name to display more information about the intrinsic. To search for an intrinsic, enter text in the search box, then click the button.

For more information about the concepts and usage related to the Neon intrinsics, see the Arm C Language Extensions documentation.


Load one single-element structure and Replicate to all lanes (of one register). This instruction loads a single-element structure from memory and replicates the structure to all the lanes of the SIMD&FP register.

A64 Instruction

            LD1R {Vt.4S},[Xn]    

Argument Preparation

ptr → Xn 


Vt.4S → result


if HaveMTEExt() then


bits(64) address;
bits(64) offs;
bits(128) rval;
bits(esize) element;
constant integer ebytes = esize DIV 8;

if n == 31 then
    address = SP[];
    address = X[n];

offs = Zeros();
if replicate then
    // load and replicate to all elements
    for s = 0 to selem-1
        element = Mem[address+offs, ebytes, AccType_VEC];
        // replicate to fill 128- or 64-bit register
        V[t] = Replicate(element, datasize DIV esize);
        offs = offs + ebytes;
        t = (t + 1) MOD 32;
    // load/store one element per register
    for s = 0 to selem-1
        rval = V[t];
        if memop == MemOp_LOAD then
            // insert into one lane of 128-bit register
            Elem[rval, index, esize] = Mem[address+offs, ebytes, AccType_VEC];
            V[t] = rval;
        else // memop == MemOp_STORE
            // extract from one lane of 128-bit register
            Mem[address+offs, ebytes, AccType_VEC] = Elem[rval, index, esize];
        offs = offs + ebytes;
        t = (t + 1) MOD 32;

if wback then
    if m != 31 then
        offs = X[m];
    if n == 31 then
        SP[] = address + offs;
        X[n] = address + offs;

Supported architectures