Targetting processors, floating-point units, and NEON with Arm DS

Arm Development Studio (Arm DS) tutorial for selecting specific processors with Arm Compiler 6 to maximize performance, selecting Floating Point Unit (FPU) and enabling NEON.


Introduction Selecting the target processor Selecting the target FPU Enabling NEON automatic vectorization Further reading

Selecting the target processor

The Arm Compiler 6 lets you target either an architecture or a specific processor when generating code:

  • Specifying an architecture provides the greatest code compatibility. The generated code can run on any processor supporting that architecture.
  • Specifying a particular processor provides optimum performance. The compiler can use processor-specific features such as instruction scheduling to generate optimized code for that specific processor.

We consider the following armclang command-line options in this section:

  • The --target command-line option is mandatory and allows you to specify the target triple. The target triple has the form architecture-vendor-OS-abi, for example aarch64-arm-none-eabi. The available triples are limited by the compiler version (see supported triples).
  • Use the --march option to generate code for a specific architecture (for example armv8-a). The supported architectures vary according to the selected target. To see a list of all the supported architectures for the selected target, use -march=list.
  • Use the --mcpu option to generate code for a specific processor (for example cortex-a53). The supported processors vary according to the selected target. To see a list of all the supported processors for the selected target, use -mcpu=list.

You must avoid specifying both the architecture (-march) and the processor (-mcpu) because it can cause a conflict. The compiler infers the correct architecture from the processor. For example, -mcpu=cortex-a53 infers -march= armv8-a. We recommend you understand the mandatory armclang options to set these options correctly.

To configure the --target, -march and -mcpu options in Arm DS:

  1. Select your project in the Project Explorer view.
  2. To display the Properties dialog box, select Project > Properties from the main menu. You can also right-click on your project in the Project Explorer view to select Properties.
    Opening Project Properties
  3. Expand C/C++ Build, then Settings in the Properties dialog box.
  4. On the Tool Settings tab, select Arm C Compiler 6 > Target to display the code generation settings.
  5. Select Enable tool specific settings.
  6. Enter a value for Target (--target). In this tutorial, we specify --target=aarch64-arm-none-eabi to generate A64 instructions for AArch64 state.
  7. Enter a value for Architecture (-march) or CPU (-mcpu). Remember to only include a value for one of these two fields. In the following example, we set CPU (-mcpu) to cortex-a53 to build for a Cortex-A53 processor. Setting the CPU(-mcpu) option to cortex-A53

  8. Click Apply and Close to save the settings.

You can see a list of all supported architectures by specifying list for the Architecture (-march) setting, then building your project. The console (Window > Show View > Console) shows the list of architecture names.

Listing of -march options available in armclang

You can also see a list of all supported processors by specifying list for the CPU (-mcpu) setting, then building your project.

Listing of -mcpu options available in armclang

If the compiled program is to run on a specific Arm architecture-based processor, select the target processor. For example, to compile code to run on a Cortex-A53 processor use the CPU (-mcpu) setting cortex-a53.

Alternatively, if the compiled program is to run on different Arm processors, choose the lowest common denominator architecture appropriate for the application. Use the Architecture (-march) setting to set the architecture. For example, if you want your program to compile for Cortex-A53 and Cortex-A57 processors, you must use the -march= armv8-a option.

Selecting the target FPU

Each target architecture has a default floating-point unit (FPU) option. However, you can use the --mfpu option to specify a target FPU architecture and override the default option.

Note: For AArch64 targets, the -mfpu option is ignored with AArch64 targets. In this case, you must use the -mcpu option to override the default FPU for aarch64-arm-none-eabi targets. For example, to prevent the use of floating-point instructions or floating-point registers for the aarch64-arm-none-eabi target use the -mcpu=name+nofp option.

There are no software floating-point libraries for targets in AArch64 state. When linking for targets in AArch64 state, armlink uses AArch64 libraries that contain Advanced SIMD and floating-point instructions and registers. The use of the AArch64 libraries applies even if you compile the source with -mcpu= name+nofp+nosimd to prevent the compiler from using Advanced SIMD and floating-point instructions and registers. Therefore, there is no guarantee that the linked image for targets in AArch64 state is entirely free of Advanced SIMD and floating-point instructions and registers.

You can prevent the use of Advanced SIMD and floating-point instructions and registers in images that are linked for targets in AArch64 state. Either re-implement the library functions or create your own library that does not use Advanced SIMD and floating-point instructions and registers.

To configure the -mfpu option in Arm DS, use the FPU (-mfpu) setting. This setting is in the same location on the Properties dialog box as the Target (--target) setting. You can find the location of this setting in the Selecting the target processor section in this tutorial.

For example, to generate A32 and T32 instructions for AArch32 state, we specify the option --target=arm-arm-none-eabi. To select the Armv8 application architecture profile we set the option -march=armv8-a. Then, to enable the Armv8 Floating-point Extension and disable the Cryptographic Extension and the Advanced SIMD extension, we set -mfpu=fp-armv8.

Setting FPU (-mfpu) option to fp-armv8

Set the value list to the -mfpu option to view a list of all the supported FPU architectures. Then build your project. The console shows the list of FPU architectures.

Enabling NEON automatic vectorization

Arm NEON technology is the implementation of the Advanced SIMD architecture extension. It is a 64-bit and 128-bit hybrid SIMD technology targeted at advanced media and signal processing applications and embedded processors.

Specific NEON instructions let you use the NEON unit to perform operations in parallel on multiple lanes of data.

There are various methods of creating code that use NEON instructions:

  • Write assembly language, or use embedded assembly language in C, and use the NEON instructions directly.
  • Write in C or C++ using the NEON intrinsics.
  • Call a library routine that has been optimized to use NEON instructions.
  • Have the compiler use automatic vectorization to optimize loops for NEON.

For additional information, you can visit the Introducing Neon for Armv8-A guide.

To enable automatic vectorization, you must target a processor that has a NEON unit.

You must set an optimization level -O1 or higher to enable the generation of Advanced SIMD instructions directly from C or C++ code. To configure the optimization level in Arm DS:

  1. Select Project > Properties and expand C/C++ Build, then Settings in the Properties dialog box.
  2. On the Tool Settings tab, select Arm C Compiler 6 > Optimizations to display the code generation settings.
  3. Select the Optimization level you require. Setting Optimization level to High (-O2)
  4. Click Apply and Close to save the settings.

The --fvectorize option allows you to enable the generation of Advanced SIMD instructions directly from C or C++ code at optimization levels -O1 and higher. Depending on the optimization level, the steps to enable automatic vectorization are different:

Level -O2 or higher:
-fvectorize option is set by default when building for a NEON-capable processor.

Level -O1:

To set -fvectorize:

  1. Select your project in the Project Explorer view.
  2. To display the Properties dialog box, select Project > Properties from the main menu. You can also right-click on your project in the Project Explorer view to select Properties.
  3. Expand C/C++ Build, then Settings in the Properties dialog box.
  4. On the Tool Settings tab, select Arm C Compiler 6 > Target to display the code generation settings.
  5. Select Enable tool specific settings.
  6. Click the Vectorization (-fvectorize) box to enable.
    Enabling Vectorization (-fvectorize)
  7. Click Apply and Close to save the settings.