Targetting processors, floating-point units, and NEON with Arm DS
Arm Development Studio (Arm DS) tutorial for selecting specific processors with Arm Compiler 6 to maximize performance, selecting Floating Point Unit (FPU) and enabling NEON.
Introduction Selecting the target processor Selecting the target FPU Enabling NEON automatic vectorization Further reading
Introduction
This tutorial assumes you have installed and licensed Arm Development Studio. For more information, see Arm® Development Studio Getting Started Guide. The content in this tutorial applies to non-Makefile projects.
Selecting the target processor
The Arm Compiler 6 lets you target either an architecture or a specific processor when generating code:
- Specifying an architecture provides the greatest code compatibility. The generated code can run on any processor supporting that architecture.
- Specifying a particular processor provides optimum performance. The compiler can use processor-specific features such as instruction scheduling to generate optimized code for that specific processor.
We consider the following armclang
command-line options in this section:
- The
--target
command-line option is mandatory and allows you to specify the target triple. The target triple has the form architecture-vendor-OS-abi, for exampleaarch64-arm-none-eabi
. The available triples are limited by the compiler version (see supported triples). - Use the
--march
option to generate code for a specific architecture (for examplearmv8-a
). The supported architectures vary according to the selected target. To see a list of all the supported architectures for the selected target, use-march=list
. - Use the
--mcpu
option to generate code for a specific processor (for examplecortex-a53
). The supported processors vary according to the selected target. To see a list of all the supported processors for the selected target, use-mcpu=list
.
You must avoid specifying both the architecture (-march
) and the processor (-mcpu
) because it can cause a conflict. The compiler infers the correct architecture from the processor. For example, -mcpu=cortex-a53
infers -march=
armv8-a. We recommend you understand the mandatory armclang options to set these options correctly.
To configure the --target
, -march
and -mcpu
options in Arm DS:
- Select your project in the Project Explorer view.
- To display the Properties dialog box, select Project > Properties from the main menu. You can also right-click on your project in the Project Explorer view to select Properties.
- Expand C/C++ Build, then Settings in the Properties dialog box.
- On the Tool Settings tab, select Arm C Compiler 6 > Target to display the code generation settings.
- Select Enable tool specific settings.
- Enter a value for Target (--target). In this tutorial, we specify
--target=aarch64-arm-none-eabi
to generate A64 instructions for AArch64 state. - Enter a value for Architecture (-march) or CPU (-mcpu). Remember to only include a value for one of these two fields. In the following example, we set CPU (-mcpu) to
cortex-a53
to build for a Cortex-A53 processor. - Click Apply and Close to save the settings.
You can see a list of all supported architectures by specifying list
for the Architecture (-march) setting, then building your project. The console (Window > Show View > Console) shows the list of architecture names.
You can also see a list of all supported processors by specifying list
for the CPU (-mcpu) setting, then building your project.
If the compiled program is to run on a specific Arm architecture-based processor, select the target processor. For example, to compile code to run on a Cortex-A53 processor use the CPU (-mcpu) setting cortex-a53
.
Alternatively, if the compiled program is to run on different Arm processors, choose the lowest common denominator architecture appropriate for the application. Use the Architecture (-march) setting to set the architecture. For example, if you want your program to compile for Cortex-A53 and Cortex-A57 processors, you must use the -march=
armv8-a
option.
Selecting the target FPU
Each target architecture has a default floating-point unit (FPU) option. However, you can use the --mfpu
option to specify a target FPU architecture and override the default option.
Note: For AArch64 targets, the -mfpu
option is ignored with AArch64 targets. In this case, you must use the -mcpu
option to override the default FPU for aarch64-arm-none-eabi
targets. For example, to prevent the use of floating-point instructions or floating-point registers for the aarch64-arm-none-eabi
target use the -mcpu=name+nofp
option.
There are no software floating-point libraries for targets in AArch64 state. When linking for targets in AArch64 state, armlink
uses AArch64 libraries that contain Advanced SIMD and floating-point instructions and registers. The use of the AArch64 libraries applies even if you compile the source with -mcpu= name+nofp+nosimd
to prevent the compiler from using Advanced SIMD and floating-point instructions and registers. Therefore, there is no guarantee that the linked image for targets in AArch64 state is entirely free of Advanced SIMD and floating-point instructions and registers.
You can prevent the use of Advanced SIMD and floating-point instructions and registers in images that are linked for targets in AArch64 state. Either re-implement the library functions or create your own library that does not use Advanced SIMD and floating-point instructions and registers.
To configure the -mfpu
option in Arm DS, use the FPU (-mfpu) setting. This setting is in the same location on the Properties dialog box as the Target (--target) setting. You can find the location of this setting in the Selecting the target processor section in this tutorial.
For example, to generate A32 and T32 instructions for AArch32 state, we specify the option --target=arm-arm-none-eabi
. To select the Armv8 application architecture profile we set the option -march=armv8-a
. Then, to enable the Armv8 Floating-point Extension and disable the Cryptographic Extension and the Advanced SIMD extension, we set -mfpu=fp-armv8
.
Set the value list
to the -mfpu
option to view a list of all the supported FPU architectures. Then build your project. The console shows the list of FPU architectures.
Enabling NEON automatic vectorization
Arm NEON technology is the implementation of the Advanced SIMD architecture extension. It is a 64-bit and 128-bit hybrid SIMD technology targeted at advanced media and signal processing applications and embedded processors.
Specific NEON instructions let you use the NEON unit to perform operations in parallel on multiple lanes of data.
There are various methods of creating code that use NEON instructions:
- Write assembly language, or use embedded assembly language in C, and use the NEON instructions directly.
- Write in C or C++ using the NEON intrinsics.
- Call a library routine that has been optimized to use NEON instructions.
- Have the compiler use automatic vectorization to optimize loops for NEON.
For additional information, you can visit the Introducing Neon for Armv8-A guide.
To enable automatic vectorization, you must target a processor that has a NEON unit.
You must set an optimization level -O1
or higher to enable the generation of Advanced SIMD instructions directly from C or C++ code. To configure the optimization level in Arm DS:
- Select Project > Properties and expand C/C++ Build, then Settings in the Properties dialog box.
- On the Tool Settings tab, select Arm C Compiler 6 > Optimizations to display the code generation settings.
- Select the Optimization level you require.
- Click Apply and Close to save the settings.
The --fvectorize
option allows you to enable the generation of Advanced SIMD instructions directly from C or C++ code at optimization levels -O1
and higher. Depending on the optimization level, the steps to enable automatic vectorization are different:
Level -O2
or higher:
-fvectorize
option is set by default when building for a NEON-capable processor.
Level -O1
:
To set -fvectorize
:
- Select your project in the Project Explorer view.
- To display the Properties dialog box, select Project > Properties from the main menu. You can also right-click on your project in the Project Explorer view to select Properties.
- Expand C/C++ Build, then Settings in the Properties dialog box.
- On the Tool Settings tab, select Arm C Compiler 6 > Target to display the code generation settings.
- Select Enable tool specific settings.
- Click the Vectorization (-fvectorize) box to enable.
- Click Apply and Close to save the settings.