NEON code may be written in a number of ways. These are briefly listed here (but see the ARM NEON Programmers Guide for details). These include the use of intrinsics, automatic vectorization of C code, the use of libraries and of course directly writing in assembly language.
Intrinsics are C or C++ pseudo-function calls that the compiler replaces with the appropriate NEON instructions. This allows you to use the data types and operations available in the NEON implementation, while allowing the compiler to handle instruction scheduling and register allocation. These intrinsics are defined in the ARM C Language Extensions document.
Auto-vectorization is controlled with the
in ARM Compiler 6, but is enabled automatically at higher optimization
levels (-O2 and above). Auto-vectorization is disabled at
if you specify
-fvectorize. Therefore, you would
use the following to enable auto-vectorization at
armclang --target=armv8a-arm-none-eabi -fvectorize -O1 -c file.c
There are various libraries available which can use NEON code. The exact status of such libraries changes over time and so current support is not covered in this guide.
Although it is technically possible to optimize NEON assembly by hand, this can be very difficult because the pipeline and memory access timings have complex inter-dependencies. Instead of hand assembly, ARM strongly recommends the use of intrinsics:
It is easier to write code using instrinsics than using assembly mnemonics.
Instrinsics provide good portability for cross-platform development.
There is no need to worry about pipeline and memory access timings.
For most cases, the result is good performance.
If you are not an experienced assembly language programmer, intrinsics can often achieve better performance than assembly. Intrinsics provide almost as much control as writing assembly language, but leave the allocation of registers to the compiler, so that you can focus on the algorithms. This leads to more maintainable source code than using assembly language.