Arm Neon technology is a 64-bit or 128-bit hybrid Single InstructionMultiple Data (SIMD) architecture that is designed to accelerate the performance of multimedia and signal processing applications. These applications include the following:

  • Video encoding and decoding
  • Audio encoding and decoding
  • 3D graphics processing
  • Speech processing
  • Image processing

This guide provides information about how to write SIMD code for Neon using assembly language. This guide is written for anyone wanting to learn more about the Armv8-A instruction set architecture. The following readers should find the information particularly useful:

  • Tools developers
  • Low-level SoC programmers, such as firmware, device driver, or android kernel developers
  • Programmers who want to optimize libraries or applications for an Arm-based target device
  • Very keen Raspberry Pi enthusiasts

This guide will grow and evolve over time. When complete, the guide will cover getting started with Neon, using it efficiently, and hints and tips for more experienced coders.

The first installment of the guide began by looking at memory operations, and how to use the flexible load and store with permute instructions.

The second installment added information about dealing with load and store leftovers, and introduced the permutation instructions.

This third installation shows how you can use Neon to perform an example data processing task: matrix multiplication.

More installments will follow.