Arm Neon technology is an advanced Single Instruction Multiple Data (SIMD) architecture extension for the Arm Cortex-A and Cortex-R series processors.

Neon technology is a packed SIMD architecture. Neon registers are considered as vectors of elements of the same data type, with Neon instructions operating on multiple elements simultaneously. Multiple data types are supported by the technology, including floating-point and integer operations.

Neon technology is intended to improve the multimedia user experience by accelerating audio and video encoding and decoding, user interface, 2D/3D graphics, and gaming. Neon can also accelerate signal processing algorithms and functions to speed up applications such as audio and video processing, voice and facial recognition, computer vision, and deep learning.

As a programmer, there are several ways you can use Neon technology:

  • Neon intrinsics
  • Neon-enabled libraries
  • Auto-vectorization by your compiler
  • Hand-coded Neon assembler
Arm Neon application examples

Neon Programmer Guides for Armv8-A

Introducing Neon for Armv8-A

Read now

Compiling for Neon with
auto-vectorization
Read now

Optimizing C code with Neon intrinsics
Read now

Neon intrinsics chromium case study
Read now

Coding for Neon
Read now

Neon intrinsics

Neon intrinsics reference search engine

Neon intrinsics are function calls that the compiler replaces with an appropriate Neon instruction or sequence of Neon instructions. Intrinsics provide almost as much control as writing assembly language, but leave the allocation of registers to the compiler, so that developers can focus on the algorithms.

View

Using Neon Intrinsics on Android

Getting started with Neon Intrinsics on Android
Read now

How to Truncate Thresholding and Convolution of a 1D Signal
Read now

Neon-enabled libraries

Arm Compute Library

The Arm Compute Library is a collection of low-level functions optimized for Arm CPU and GPU architectures targeted at image processing, computer vision, and machine learning.

Learn more

Ne10

Open-source C library, hosted on GitHub by Arm, are common processing intensive functions heavily optimized for Arm. Ne10 is a modular structure consisting of several smaller libraries.

Learn more

Libyuv

Open-source project that includes YUV scaling and conversion functionality.

Learn more

Skia

Open-source 2D graphics library used as the graphics engine for web browsers and operating systems.

Learn more

Auto-vectorization

Auto-vectorization is the process by which a compiler can automatically analyze your code and identify opportunities to optimize performance with Neon. Compilers that can perform auto-vectorization include Arm Compiler, LLVM or Clang, and GCC. Learn more with the following resources:

Compiling for Neon with
auto-vectorization
Read now

Arm Compiler documentation

Read now

Auto-vectorization in LLVM

Read now

GCC documentation

Read now

Neon assembly code

Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile

Contains reference documentation for all Advanced SIMD instructions.



Read now

Software Optimization Guides

Arm publishes Software Optimization Guides for some processors. These guides provide high-level information about the pipeline, instruction performance characteristics, and special performance considerations. This information can be particularly useful to programmers using Neon.

Search now

Arm tools for Neon

Arm Development Studio

Designed specifically for Arm architecture, Development Studio is the most comprehensive embedded C/C++ dedicated software development solution on the market. It accelerates software engineering while helping you build robust and more efficient products.

Learn more

Arm Mobile Studio

The free-to-use Studio also provides graphics application tracing to determine exactly where rendering defects occur in your game or application.




Learn more

Arm Compiler

Arm Compiler provides the earliest, most complete, and most accurate support for the latest architectural features and extensions of the Arm architecture. Arm Compiler supports all the latest Arm Cortex, Neoverse and SecurCore processors, including cores that are in development.

Learn more

Resources

Arm v8-A Neon optimization

This presentation presents an example optimization of the Fast Fourier Transform (FFT) operation. It looks at:

  • The performance gains that can be obtained using Neon
  • A Neon optimization workflow for Ne10
  • Examples of Ne10 FFT and Android libraries
  • A performance comparison between assembly and intrinsics

Watch now

Taming Armv8 Neon: from theory to benchmark results

Making Neon shine simply requires following a few tips. In this presentation, Neon is introduced with a comparison between Armv8 and Armv7 Neon before exploring the key factors to successful use of Neon. The presentation describes work done on the Skia graphics library (used in Chromium, Firefox, Android) and shares several tips.

Watch now