You copied the Doc URL to your clipboard.

1.3. Intrinsics

Intrinsic functions and data types, or intrinsics in the shortened form, provide access to low-level NEON functionality from C or C++ source code. They use syntax that is similar to function calls. Software can pass NEON vectors as function arguments or return values, and declare them as normal variables.

Intrinsics provide almost as much control as writing assembly language, but leave the allocation of registers to the compiler, so that you can focus on the algorithms. Also, the compiler can optimize the intrinsics like normal C or C++ code, replacing them with more efficient sequences if possible. It can also perform instruction scheduling to remove pipeline stalls for the specified target processor. This leads to more maintainable source code than using assembly language.

Example 1.1 shows a short function that takes a four-lane vector of 32-bit unsigned integers as input parameter, and returns a vector where the values in all lanes have been doubled.

Example 1.1. Using NEON intrinsics in C code

#include <arm_neon.h>

uint32x4_t double_elements(uint32x4_t input)
    return(vaddq_u32(input, input));

Example 1.2 shows the disassembled version of the code generated from Example 1.1, compiled for hardware linkage. The double_elements function translates to a single NEON instruction and a return sequence.

Example 1.2.  Disassembly of instructions generated by intrinsics example

double_elements PROC
    VADD.I32 q0,q0,q0
    BX       lr

Example 1.3 shows the disassembly of the same example compiled for software linkage. In this situation, the code must copy the parameters from general-purpose registers to a NEON vector register before use. After the calculation, it must copy the return value back from NEON registers to general-purpose registers.

Example 1.3.  Disassembly of instructions generated by intrinsics example

double_elements PROC
    VMOV     d0,r0,r1
    VMOV     d1,r2,r3
    VADD.I32 q0,q0,q0
    VMOV     r0,r1,d0
    VMOV     r2,r3,d1
    BX       lr

GCC and armcc support the same intrinsics, so code written with NEON intrinsics is completely portable between the toolchains. There are no specific command line options required for the compiler to process NEON intrinsics. You must include the arm_neon.h header file in any source file using intrinsics, and must specify the command line options described in Default behavior of tools.

It can be useful to have a source module optimized using intrinsics, that can also be compiled for processors that do not implement NEON technology. The macro __ARM_NEON__ is defined by gcc when compiling for a target that implements NEON technology. RVCT 4.0 build 591 or later also define this macro. Software can use this macro to provide both optimized and plain C or C++ versions of the functions provided in the file, selected by the command line parameters you pass to the compiler.

For information about the intrinsic functions and vector data types, see the:

  • RealView Compilation Tools Compiler Reference Guide, available from

  • GCC documentation, available from

Was this page helpful? Yes No