You copied the Doc URL to your clipboard.

Selecting optimization options

Arm® Compiler performs several optimizations to reduce the code size and improve the performance of your application. There are different optimization levels which have different optimization goals. Therefore optimizing for a certain goal has an impact on the other goals. Optimization levels are always a trade-off between these different goals.

Arm Compiler provides various optimization levels to control the different optimization goals. The best optimization level for your application depends on your application and optimization goal.

Table 2-8 Optimization goals

Optimization goal Useful optimization levels
Smaller code size -Oz
Faster performance -O2, -O3, -Ofast, -Omax
Better debug experience -O1
Better correlation between source code and generated code -O0
Faster compile and build time -O0
Balanced code size reduction and fast performance -Os

If you use a higher optimization level for performance, then this has a higher impact on the other goals such as degraded debug experience, increased code size, and increased build time.

If your optimization goal is code size reduction, then this has an impact on the other goals such as degraded debug experience, slower performance, and increased build time.

Optimization level -O0

-O0 disables all optimizations. This optimization level is the default. Using -O0 results in a faster compilation and build time, but produces slower code than the other optimization levels. Code size and stack usage are significantly higher at -O0 than at other optimization levels. The generated code closely correlates to the source code, but significantly more code is generated, including dead code.

Optimization level -O1

-O1 enables the core optimizations in the compiler. This optimization level provides a good debug experience with better code quality than -O0. Also the stack usage is improved over -O0. Arm recommends this option for a good debug experience.

The differences when using -O1, as compared to -O0 are:

  • Optimizations are enabled. This might reduce the fidelity of debug information.
  • Inlining and tailcalls are enabled, meaning backtraces might not give the stack of open function activations which might be expected from reading the source.
  • If the result is not needed, a function with no side-effects might not be called in the expected place, or might be omitted.
  • Values of variables might not be available within their scope after they are no longer used. For example, their stack location might have been reused.

Optimization level -O2

-O2 is a higher optimization for performance compared to -O1. It adds few new optimizations, and changes the heuristics for optimizations compared to -O1. This is the first optimization level at which the compiler might generate vector instructions. It also degrades the debug experience, and might result in an increased code size compared to -O1.

The differences when using -O2 as compared to -O1 are:

  • The threshold at which the compiler believes that it is profitable to inline a call site might increase.
  • The amount of loop unrolling that is performed might increase.
  • Vector instructions might be generated for simple loops and for correlated sequences of independent scalar operations.

The creation of vector instructions can be inhibited with the armclang command-line option -fno-vectorize.

Optimization level -O3

-O3 is a higher optimization for performance compared to -O2. This optimization level enables optimizations that require significant compile-time analysis and resources, and changes the heuristics for optimizations compared to -O2. -O3 instructs the compiler to optimize for the performance of generated code and disregard the size of the generated code, which might result in an increased code size. It also degrades the debug experience compared to -O2.

The differences when using -O3 as compared to -O2 are:

  • The threshold at which the compiler believes that it is profitable to inline a call site increases.
  • The amount of loop unrolling that is performed is increased.
  • More aggressive instruction optimizations are enabled late in the compiler pipeline.

Optimization level -Os

-Os aims to provide high performance without a significant increase in code size. Depending on your application, the performance provided by -Os might be similar to -O2 or -O3.

-Os provides code size reduction compared to -O3. It also degrades the debug experience compared to -O1.

The differences when using -Os as compared to -O3 are:

  • The threshold at which the compiler believes it is profitable to inline a call site is lowered.
  • The amount of loop unrolling that is performed is significantly lowered.

Optimization level -Oz

-Oz aims to provide the smallest possible code size. Arm recommends this option for best code size. This optimization level degrades the debug experience compared to -O1.

The differences when using -Oz as compared to -Os are:

  • Instructs the compiler to optimize for code size only and disregard the performance optimizations, which might result in slower code.
  • Function inlining is not disabled. There are instances where inlining might reduce code size overall, for example if a function is called only once. The inlining heuristics are tuned to inline only when code size is expected to decrease as a result.
  • Optimizations that might increase code size, such as Loop unrolling and loop vectorization are disabled.
  • Loops are generated as while loops instead of do-while loops.

Optimization level -Ofast

-Ofast performs optimizations from level -O3, including those optimizations performed with the -ffast-math armclang option.

This level also performs other aggressive optimizations that might violate strict compliance with language standards.

This level degrades the debug experience, and might result in increased code size compared to -O3.

Optimization level -Omax

-Omax performs maximum optimization, and specifically targets performance optimization. It enables all the optimizations from level -Ofast, together with Link Time Optimization (LTO).

At this optimization level, Arm Compiler might violate strict compliance with language standards. Use this optimization level for the fastest performance.

This level degrades the debug experience, and might result in increased code size compared to -Ofast.

If you want to compile at -Omax and have separate compile and link steps, then you must also include -Omax on your armlink command line.

Examples

The example shows the code generation when using the -O0 optimization option. To perform this optimization, compile your source file using:

armclang --target=arm-arm-none-eabi -march=armv7-a -O0 -c -S file.c

Table 2-9 Example code generation with -O0

Source code in file.c Unoptimized output from armclang
int dummy()
{
    int x=10, y=20;
    int z;
    z=x+y;
    return 0;
}
dummy:
    .fnstart
    .pad #12
     sub     sp, sp, #12
     mov     r0, #10
     str     r0, [sp, #8]
     mov     r0, #20
     str     r0, [sp, #4]
     ldr     r0, [sp, #8]
     add     r0, r0, #20
     str     r0, [sp]
     mov     r0, #0
     add     sp, sp, #12
     bx	 lr

The example shows the code generation when using the -O1 optimization option. To perform this optimization, compile your source file using:

armclang --target=arm-arm-none-eabi -march=armv7-a -O1 -c -S file.c

Table 2-10 Example code generation with -O1

Source code in file.c Optimized output from armclang
int dummy()
{
    int x=10, y=20;
    int z;
    z=x+y;
    return 0;
}
dummy:
    .fnstart
    movs r0, #0
    bx lr

The source file contains mostly dead code, such as int x=10 and z=x+y. At optimization level -O0, the compiler performs no optimization, and therefore generates code for the dead code in the source file. However, at optimization level -O1, the compiler does not generate code for the dead code in the source file.

Was this page helpful? Yes No