Sorry, your browser is not supported. We recommend upgrading your browser.
We have done our best to make all the documentation and resources available on old versions of Internet Explorer, but vector image support and the layout may not be optimal. Technical documentation is available as a PDF Download.
Introduces the NEON™ unit and explains how to
take advantage of automatic vectorizing features.
ARM NEON technology is the implementation of the Advanced SIMD architecture extension. It is a 64 and 128-bit hybrid SIMD technology targeted at advanced media and signal processing applications and embedded processors.
The NEON unit
The NEON unit has a register bank of thirty-two 64-bit vector registers that can be operated on in parallel.
NEON C extensions
The NEON C extensions are a set of new data types and intrinsic functions defined by ARM to enable access to the NEON unit from C.
Automatic vectorization involves the high-level analysis of loops in your code. This is the most efficient way to map the majority of typical code onto the functionality of the NEON unit.
Data references within a vectorizable loop
To vectorize, the compiler has to identify variables with a vector access pattern. It also has to ensure that there are no data dependencies between different iterations of the loop.
NEON vectorization performance goals
Most applications require tuning to gain the best performance from vectorization. There is always some overhead so the theoretical maximum performance cannot be reached.
Reduction of a vector to a scalar
A special category of scalar use within loops is reduction operations. This category involves the reduction of a vector of values down to a scalar result.
Vectorization on loops containing pointers
When accessing arrays, the compiler can often prove that memory accesses do not overlap. When using pointers, this is less likely to be possible, and either requires a runtime test, or requires you to use the restrict keyword.