Arm Code Advisor insight: Early exit prevents vectorization

In this example, an early exit within a loop prevents the compiler from vectorizing that loop. Vectorization takes a series of serial loop iterations and parallelizes them. Because the code tests on every loop iteration whether to break out of the loop, parallelization is not possible. Vectorizing any two operations might straddle the intended break point.

Note:
Insights are only supported with Arm Compiler for HPC versions 18.4.2 and earlier.

Insight example

#include <limits.h>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>

#define ARRAY_SIZE 3000

void do_loop(int *src, int *res, int n, int x);

int main() {
  time_t t;
  srand((unsigned int)time(&t));
  int src[ARRAY_SIZE];
  int res[ARRAY_SIZE];
  for (int j = 0; j < ARRAY_SIZE; j++) {
    src[j] = j;
    res[j] = j * 2;
  }

  for (int i = 0; i < 10000; i++) {
    do_loop(src, res, ARRAY_SIZE, 47);
    do_loop(src, res, ARRAY_SIZE, 232);
    do_loop(src, res, ARRAY_SIZE, 200);
    do_loop(src, res, ARRAY_SIZE, 2000);
    do_loop(src, res, ARRAY_SIZE, 12);
    do_loop(src, res, ARRAY_SIZE, 43);
  }

  return 0;
}

void do_loop(int *src, int *res, int n, int x) {
  int s = rand();
  for (int i = 0; i < n; i++) {
    if (src[i] == x) {
      if (src[i] > 100)
        //  The loop is prevented from vectorizing by the early exit here.
        break;
    }
    res[i] = s;
  }
}

Solution description

A solution to this problem is to minimize the situations where the early exit prevents vectorization.

In the example solution below, the do_loop function has been rewritten to contain two loops: one loop if x is less than or equal to 100, and another loop if x is greater than 100.

Only the second loop now contains an early exit. The compiler still cannot vectorize this loop due to the early exit. However the first loop can now be vectorized. In our example, this means half of the function calls to do_loop will now run the vectorized loop (because half of the do_loop function calls have x less than 100).

Solution example

void do_loop(int *src, int *res, int n, int x) {
  int s = rand();
  if (x <= 100) {
    for (int i = 0; i < n; i++) {
      res[i] = s;
    }
  } else {
    for (int i = 0; i < n; i++) {
      if (src[i] == x) {
        break;
      }
      res[i] = s;
    }
  }
}

Related Information