Compiling for Neon with Arm Compiler 6

To enable automatic vectorization you must specify appropriate compiler options to do the following:

In addition, specifying the -Rpass=loop compiler option displays useful diagnostic information from the compiler about how it optimized particular loops. This information includes vectorization width and interleave count.

Note that -Rpass=loop is a [COMMUNITY] feature of Arm Compiler.

Specifying a Neon-capable target

Neon is required in all standard Armv8-A implementations, so targeting any Armv8-A architecture or processor will allow the generation of Neon code.

If you only want to run code on one particular processor, you can target that specific processor. Performance is optimized for the micro-architectural specifics of that processor. However code is only guaranteed to run on that processor.

If you want your code to run on a wide range of processors, you can target an architecture. Generated code runs on any processor implementation of that target architecture, but performance might be impacted.

To target Armv8‑A AArch64 state:

armclang --target=aarch64-arm-none-eabi

To target the Cortex‑A53 in AArch32 state:

armclang --target=arm-arm-none-eabi -mcpu=cortex-a53

For the older Armv7 architecture, where Neon was optional, you can use the -mcpu, -march and -mfpu options to specify that Neon is available.

Specifying an auto-vectorizing optimization level

Arm Compiler 6 provides a wide range of optimization levels, selected with the -O option:

Option Meaning Auto-vectorization
-O0 Minimum optimization Never
-O1 Restricted optimization Disabled by default.
-O2 High optimization Enabled by default.
-O3 Very high optimization Enabled by default.
-Os Reduce code size, balancing code size against code speed. Enabled by default.
-Oz Smallest possible code size Enabled by default.
-Ofast Optimize for high performance beyond -O3 Enabled by default.
-Omax Optimize for high performance beyond -Ofast Enabled by default.

See

Selecting optimization options, in the Arm Compiler User Guide and -O, in the Arm Compiler armclang Reference Guide for more details about these options.

 

Auto-vectorization is enabled by default at optimization level -O2 and higher. The -fno-vectorize option lets you disable auto-vectorization.

At optimization level -O1, auto-vectorization is disabled by default. The -fvectorize option lets you enable auto-vectorization.

At optimization level -O0, auto-vectorization is always disabled. If you specify the -fvectorize option, the compiler ignores it.

Previous Next