Options for writing Helium-enabled code
Programming in any high-level language is a tradeoff between the ease of writing code, and the amount of control that you have over the low-level instructions that are output by the compiler. This is true when targeting Helium-enabled hardware. The goal is to ensure that wherever your code contains vectorization opportunities where operations could be performed in parallel, Helium instructions are used.
At one end of the spectrum, you could write all your code in standard C/C++ and leave the implementation decisions to the compiler. If you are using an auto-vectorizing compiler, and your code is straightforward, this can produce excellent results. The compiler generates Helium instructions for all the vectorizable portions of your code.
The benefit of this approach is that it requires very little effort from the programmer, except for writing standard C/C++ code.
The drawback of this approach is that, if the compiler does not do what you want, for whatever reason, you might not have enough control to change that situation. For example, if your code is complex, the compiler might miss a vectorization opportunity and fail to use Helium. Modifying your code to follow best practices might be enough to help the compiler identify the vectorization opportunity, but you cannot be sure.
At the other end of the spectrum, you could write all your Helium code by hand in assembly. This gives you full control over the instructions used, but at the cost of vastly increased programmer effort.
The different options available for writing Helium-enabled code are:
- Helium-enabled libraries
- Helium intrinsics
- Assembly code
Libraries that support Helium provide one of the easiest ways to take advantage of Helium.
Libraries provide a suite of functions that you can use in your own code. When you compile for a Helium-enabled target, a library variant using Helium instructions is selected. When you compile for a target that does not support Helium, a library variant using standard Arm instructions is selected. This means that the same source code can easily be compiled for both Helium-enabled targets and non-Helium-enabled targets.
Examples of Helium-enabled libraries include:
- CMSIS-DSP – A suite of common signal processing functions for use on Cortex-M processor-based devices.
- CMSIS-NN – A collection of efficient neural network kernels that are developed to maximize the performance, and minimize the memory footprint, of neural networks on Cortex-M processor cores.
Libraries are easy to incorporate into your code, and the implementations of the functions have already been optimized. For example, CMSIS-DSP has been designed to provide many of the functions that you would need to write signal-processing code like audio filters or Fast Fourier Transform (FFT).
The disadvantage of libraries is that you only have access to the functions that the library designer has provided.
Auto-vectorization features in your compiler can automatically optimize your code to take advantage of Helium.
Auto-vectorization means allowing the compiler to automatically identify the areas of your code that would benefit from Single Instruction Multiple Data (SIMD) optimizations.
The benefit of using auto-vectorization is that the programmer leaves everything to the compiler.
The disadvantage of auto-vectorization is that, if the compiler does not do what you want, you might not have enough control to change that situation. For example, the compiler might fail to identify that a particular part of your code is vectorizable. You can use coding best practices to help the compiler identify that code is vectorizable, but they might not be enough to guide the compiler in the right direction. In these situations, you might toned to use other options, for example intrinsics or inline assembly, to ensure that Helium instructions are used.
Helium intrinsics are function calls that the compiler replaces with appropriate Helium instructions. Using Helium intrinsics gives you direct, low-level access to the exact Helium instructions that you want, all from C/C++ code.
The benefit of using intrinsics is that they provide almost as much control as writing assembly language, but leave details like register allocation to the compiler, so that developers can focus on the algorithms.
The disadvantage of using Helium intrinsics is that programming with intrinsics can be more complex than writing standard C/C++ code, and requires the programmer to learn about the available Helium intrinsics.
For very high performance, hand-coded Helium assembly code is an alternative approach for experienced programmers.
You can use pure assembly code modules (
in your code, or you can use inline assembly code to embed assembler
instructions in your C and C++ code.
The benefit of using assembly code is that it provides absolute control over the Helium instructions that are used.
The disadvantage of using assembly code is that writing assembly code can be a very complex process that most people would rather not have to do. Optimizing hand-written assembly code often requires detailed knowledge of the target hardware pipeline, especially for in-order Cortex- M processors. You might need to write and maintain different code variants for different targets to achieve optimal performance.