Arm Neon technology is a 64-bit or 128-bit hybrid Single Instruction Multiple Data (SIMD) architecture that is designed to accelerate the performance of multimedia and signal processing applications. These applications include the following:

  • Video encoding and decoding
  • Audio encoding and decoding
  • 3D graphics processing
  • Speech processing
  • Image processing

This guide provides information about how to write SIMD code for Neon using assembly language. This guide is written for anyone wanting to learn more about the Armv8-A instruction set architecture. The following readers should find the information particularly useful:

  • Tools developers
  • Low-level SoC programmers, such as firmware, device driver, or android kernel developers
  • Programmers who want to optimize libraries or applications for an Arm-based target device
  • Very keen Raspberry Pi enthusiasts

This guide covers getting started with Neon, using it efficiently, and hints and tips for more experienced coders. Specifically, this guide deals with the following subject areas:

  • Memory operations, and how to use the flexible load and store instructions.
  • Using the permutation instructions to deal with load and store leftovers.
  • Using Neon to perform an example data processing task, matrix multiplication.
  • Shifting operations, using the example of converting image data formats.