Coding best practices for auto-vectorization
As an implementation becomes more complicated the likelihood that the compiler can auto-vectorize the code decreases. For example, loops with the following characteristics are particularly difficult (or impossible) to vectorize:
- Loops with interdependencies between different loop iterations.
- Loops with break clauses.
- Loops with complex conditions.
Arm recommends modifying your source code implementation to eliminate these situations.
For example, a necessary condition for auto-vectorization is that the number of iterations in the loop size must be known at the start of the loop. Break conditions mean the loop size may not be knowable at the start of the loop, which will prevent auto-vectorization. If it is not possible to completely avoid a break condition, it may be worthwhile breaking up the loops into multiple vectorizable and non-vectorizable parts.
A full discussion of the compiler directives used to control vectorization of loops for can be found in the LLVM-Clang documentation, but the two most important are:
#pragma clang loop vectorize(enable)
#pragma clang loop interleave(enable)
These pragmas are hints to the compiler to perform SLP and Loop vectorization respectively. They are [COMMUNITY] features of Arm Compiler.
More detailed guides covering auto-vectorization are available for the Arm C/C++ Compiler Linux user space compiler, although many of the points will apply across LLVM-Clang variants: