Arm Optimization Report

Available in Arm Compiler for Linux v19.3+

Overview How to use Arm Optimization Report Other resources


Arm Optimization Report is a new, beta-quality feature of Arm Compiler for Linux version 19.3 that builds upon the llvm-opt-report tool available in open-source LLVM. The new Arm Optimization Report feature makes it easier to see what optimization decisions the compiler is making, in-line with your source code.


Arm Optimization Report is a BETA feature that remains under development, and therefore is subject to ongoing changes. The information in this tutorial is accurate at the time of publication however, it might not be fully representative of the final product. Please contact your Arm representative if you experience unexpected results.

Arm Optimization Report helps answer questions regarding unrolling, vectorization, and interleaving:


To answer the questions: Was a loop unrolled? If so, what was the unroll factor?

Unrolling is when a scalar loop is transformed to perform multiple iterations at once, but still as scalar instructions.

The unroll factor is the number of iterations of the original loop that are performed at once. Sometimes, loops with known small iteration counts are completely unrolled, such that no loop structure remains. In completely unrolled cases, the unroll factor is the total scalar iteration count.


To answer the questions: Was a loop vectorized? If so, what was the vectorization factor?

Vectorization is when multiple iterations of a scalar loop are replaced by a single iteration of vector instructions.

The vectorization factor is the number of lanes in the vector unit, and corresponds to the number of scalar iterations performed by each vector instruction


The true vectorization factor is unknown at compile-time for SVE, because SVE supports scalable vectors.

For this reason, when SVE is enabled, Arm Optimization Report reports a vectorization factor that corresponds to a 128-bit SVE implementation.

If you are working with an SVE implementation with a larger vector width (for example, 256 or 512 bits), the number of scalar iterations performed by each vector instruction increases proportionally.

SVE scaling factor = <true SVE vector width> / 128


To answer the question: What was the interleave count?

Interleaving is a combination of vectorization followed by unrolling; multiple streams of vector instructions are performed in each iteration of the loop.

This combination of vectorization and unrolling information lets you know how many iterations of the original scalar loop are performed in each iteration of the generated code.

 Number of scalar iterations = <unroll factor> x <vectorization factor> x <interleave count> x <SVE scaling factor>


Annotations reference

The annotations Arm Optimization Report uses to annotate the source code are:

Annotation Description
I  A function was inlined.
U<N>  A loop was unrolled <N> times.
V<F, I>

 A loop has been vectorized. Each vector iteration performed has the equivalent of F*I scalar iterations.

Vectorization Factor, F, is the number of scalar elements processed in parallel.

Interleave count, I, is the number of times the vector loop was unrolled.

How to use Arm Optimization Report

This topic describes how to use Arm Optimization Report.

Before you begin

You must have downloaded and installed Arm Compiler for Linux version 19.3+.


  1. To generate a machine-readable .opt.yaml report, at compile time add -fsave-optimization-record to your command line.

    An <filename>.opt.yaml report is generated by Arm Compiler, where <filename> is the name of the binary.
  2. To inspect the <filename>.opt.yaml report, as augmented source code, use arm-opt-report:

    arm-opt-report <filename>.opt.yaml
    Annotated source code appears in the terminal.

Example: Interpreting an example output

  1. Build the following source code:

    void bar();
    void foo() { bar(); }
    void Test(int *res, int *c, int *d, int *p, int n) {
      int i;
    #pragma clang loop vectorize(assume_safety)
      for (i = 0; i < 1600; i++) {
        res[i] = (p[i] == 0) ? res[i] : res[i] + d[i];
      for (i = 0; i < 16; i++) {
        res[i] = (p[i] == 0) ? res[i] : res[i] + d[i];
      foo(); bar(); foo();
  2. Build the function as a shared object file:

    $ armclang -O3 -fsave-optimization-record example.c -c -o example.o
    This generates a file,example.opt.yaml, in the same directory as the built object.
    For compilations that create multiple object files, there is a report for each build object.
  3. View the example.opt.yaml file using arm-opt-report:

    $ arm-opt-report example.opt.yaml
    Annotated source code is displayed in the terminal:

    < example.c
     1          | void bar();
     2          | void foo() { bar(); }
     3          |
     4          | void Test(int *res, int *c, int *d, int *p, int n) {
     5          |   int i;
     6          |
     7          | #pragma clang loop vectorize(assume_safety)
     8     V4,1 |   for (i = 0; i < 1600; i++) {
     9          |     res[i] = (p[i] == 0) ? res[i] : res[i] + d[i];
    10          |   }
    11          |
    12  U16     |   for (i = 0; i < 16; i++) {
    13          |     res[i] = (p[i] == 0) ? res[i] : res[i] + d[i];
    14          |   }
    15          |
    16 I        |   foo();
    17          |
    18          |   foo(); bar(); foo();
       I        |   ^
       I        |                 ^
    19          | }

The example Arm Optimization Report output is interpreted as follows:

  • The for loop on line 10
    • was vectorized
    • has a vectorization factor of 4 (there are 4 32-bit integer lanes)
    • has an interleave factor of 1 (so was not interleaved)

  • The for loop on line 14 was unrolled 16 times.  This means it was completely unrolled, with no remaining loop.

  • All 3 instances of foo() were inlined.

Related information