Overview

In this guide, we describe how to set up Android Studio for native C++ development, and learn how to use Neon intrinsics for Arm-powered mobile devices.

This article was originally published on CodeProject as a sponsored article by Arm. These articles are intended to provide you with information on products and services that we consider useful and of value to developers. Please see the related information section for the original article on CodeProject.

Do not repeat yourself (DRY) is one of the major principles of software development. Following this principle typically means reusing your code using functions. However, invoking a function adds extra overhead. To reduce this overhead, compilers take advantage of built-in functions called intrinsics. The compiler replaces the intrinsics that are used in the high-level programming languages, for example C/C++, with mostly 1-1 mapped assembly instructions.

To further improve performance, you need assembly to use Assembly code. However, with Arm Neon intrinsics you can avoid the complication of writing assembly functions. Instead you only need to program in C/C++ and call the intrinsics or instruction functions that are declared in the arm_neon.h header file.

As an Android developer, you probably do not have time to write assembly language. Instead, your focus is on app usability, portability, design, data access, and tuning your app to various devices. If so, Neon intrinsics can help with performance.

Arm Neon intrinsics technology is an advanced Single Instruction Multiple Data (SIMD) architecture extension for Arm processors. SIMD performs the same operation on a sequence, or vector, of data during a single CPU cycle.

For instance, if you are summing numbers from two one-dimensional arrays, you must add them one by one. In a non-SIMD CPU, each array element is loaded from memory to CPU registers, the register values are added, and the result is stored in memory. This procedure is repeated for all elements. To speed up such operations, SIMD-enabled CPUs load several elements at once, perform the operations, then store results to memory. Performance improves depending on the sequence length, N. Theoretically, the computation time reduces N times.

Using SIMD architecture, Neon intrinsics can accelerate the performance of multimedia and signal processing applications, including video and audio encoding and decoding, 3D graphics, and speech and image processing. Neon intrinsics provide almost as much control as writing assembly code. However, Neon intrinsics leave the allocation of registers to the compiler. This allows developers to focus on the algorithms. Therefore, Neon intrinsics strike a balance between performance improvement and the writing of assembly language.

This guide shows you how to:

  • Set up an Android development environment to use Neon intrinsics.
  • Implement an Android application that uses the Android Native Development Kit (NDK) to calculate the dot product of two vectors in C/C++.
  • How to improve the performance of such a function with Neon intrinsics.

At the end of this guide, you can Check your knowledge. You will learn how to use Neon intrinsics for Arm-powered mobile devices.

1.1 Before you begin

This project was created with Android Studio. The sample code is available from the GitHub repository NeonIntrinsics-Android. The code was tested on a Samsung SM-J710F phone.

You should also be aware of the Neon intrinsics search engine that can be found here.

Next