Arm Optimization Report

Available in Arm Compiler for Linux v19.3+


Overview How to use Arm Optimization Report Other resources

How to use Arm Optimization Report

This topic describes how to use Arm Optimization Report.

Before you begin

You must have downloaded and installed Arm Compiler for Linux version 19.3+.

Procedure

  1. To generate a machine-readable .opt.yaml report, at compile time add -fsave-optimization-record to your command line.

    An <filename>.opt.yaml report is generated by Arm Compiler, where <filename> is the name of the binary.
  2. To inspect the <filename>.opt.yaml report, as augmented source code, use arm-opt-report:

    arm-opt-report <filename>.opt.yaml
    Annotated source code appears in the terminal.

Example: Interpreting an example output

  1. Build the following source code:

    void bar();
    void foo() { bar(); }
    
    void Test(int *res, int *c, int *d, int *p, int n) {
      int i;
    
    #pragma clang loop vectorize(assume_safety)
      for (i = 0; i < 1600; i++) {
        res[i] = (p[i] == 0) ? res[i] : res[i] + d[i];
      }
    
      for (i = 0; i < 16; i++) {
        res[i] = (p[i] == 0) ? res[i] : res[i] + d[i];
      }
    
      foo();
    
      foo(); bar(); foo();
    }
  2. Build the function as a shared object file:

    $ armclang -O3 -fsave-optimization-record example.c -c -o example.o
    This generates a file,example.opt.yaml, in the same directory as the built object.
    For compilations that create multiple object files, there is a report for each build object.
  3. View the example.opt.yaml file using arm-opt-report:

    $ arm-opt-report example.opt.yaml
    Annotated source code is displayed in the terminal:

    < example.c
     1          | void bar();
     2          | void foo() { bar(); }
     3          |
     4          | void Test(int *res, int *c, int *d, int *p, int n) {
     5          |   int i;
     6          |
     7          | #pragma clang loop vectorize(assume_safety)
     8     V4,1 |   for (i = 0; i < 1600; i++) {
     9          |     res[i] = (p[i] == 0) ? res[i] : res[i] + d[i];
    10          |   }
    11          |
    12  U16     |   for (i = 0; i < 16; i++) {
    13          |     res[i] = (p[i] == 0) ? res[i] : res[i] + d[i];
    14          |   }
    15          |
    16 I        |   foo();
    17          |
    18          |   foo(); bar(); foo();
       I        |   ^
       I        |                 ^
    19          | }

The example Arm Optimization Report output is interpreted as follows:

  • The for loop on line 10
    • was vectorized
    • has a vectorization factor of 4 (there are 4 32-bit integer lanes)
    • has an interleave factor of 1 (so was not interleaved)

  • The for loop on line 14 was unrolled 16 times.  This means it was completely unrolled, with no remaining loop.

  • All 3 instances of foo() were inlined.

Related information

Previous Next