Intrinsics are functions whose precise implementation is known to a compiler. The Neon intrinsics are a set of C and C++ functions defined in
arm_neon.h which are supported by the Arm compilers and GCC. These functions let you use Neon without having to write assembly code directly, since the functions themselves contain short assembly kernels which are inlined into the calling code. Additionally, register allocation and pipeline optimization are handled by the compiler so many difficulties faced by the assembly programmer are avoided.
Using the Neon intrinsics has a number of benefits:
- Powerful: Intrinsics give the programmer direct access to the Neon instruction set without the need for hand-written assembly code.
- Portable: Hand-written Neon assembly instructions might need to be re-written for different target processors. C and C++ code containing Neon intrinsics can be compiled for a new target or a new execution state (for example, migrating from AArch32 to AArch64) with minimal or no code changes.
- Flexible: The programmer can exploit Neon when needed, or use C/C++ when it isn’t, while avoiding many low-level engineering concerns.
However, intrinsics might not be the right choice in all situations:
- There is a steeper learning curve to use Neon intrinsics than importing a library or relying on a compiler.
- Hand-optimized assembly code might offer the greatest scope for performance improvement even if it is more difficult to write.
We shall now go through a couple of examples where we re-implement some C functions using Neon intrinsics. The examples chosen do not reflect the full complexity of their application, but they should illustrate the use of intrinsics and act as a starting point for more complex code.