Arm Code Advisor insight: Cannot vectorize due to unsafe memory dependencies

This loop cannot be vectorized, because of unsafe memory dependencies in the loop body. Try using #pragma loop distribute(enable) to allow loop distribution to attempt to isolate the offending operations into a separate loop.

Insight example

In the example inc_copy loop, there is a backwards memory dependency between the two arrays A[] and B[]. The first operation updates A[i-1], while a subsequent operation sets B[i] to A[i].

#define ARRAY_SIZE 3000

void inc_copy(int *A, int *B, int n);

int main() {
  int A[ARRAY_SIZE];
  int B[ARRAY_SIZE];
  for (int j = 0; j < ARRAY_SIZE; j++) {
    A[j] = j;
    B[j] = j * 2;
  }

  for (int i = 0; i < 10000; i++) {
    inc_copy(A, B, ARRAY_SIZE);
  }

  return 0;
}

void inc_copy(int *A, int *B, int n) {
  for (int i = 1; i < n; i++) {
    A[i-1] = A[i-1] + 1;
    B[i] = A[i];
  }
}

Running the code sequentially there is no problem, because A[i-1] is always modified in the loop iteration after it was used to set B[n].

That is, the scalar loop does the following:

  • Modify A[0]
  • Set B[1] to A[1]
  • Modify A[1]
  • Set B[2] to A[2]
  • Modify A[2]
  • Set B[3] to A[3]
  • ...

However, if the code were vectorized it would be possible (even likely!) that A[i-1] would be modified before it was used to set B[i]. That is, a 2-lane vectorized loop would do the following:

  • Modify A[0] AND A[1]
  • Set B[1] to A[1] AND B[2] to A[2] (Note that B[1] is set to A[1] after A[1] has been modified!)
  • Modify A[2] AND A[3]
  • Set B[3] to A[3] AND B[4] to A[4] (Note that B[3] is set to A[3] after A[3] has been modified!)
  • ...

Solution description

There are two ways to solve this problem.

The first solution is to reorder the statements in the loop. If B[n] is always set to A[n] before A[n-1] is modified, the backwards dependency is removed.

  • Set B[1] to A[1]
  • Modify A[0]
  • Set B[2] to A[2]
  • Modify A[1]
  • Set B[3] to A[3]
  • Modify A[2]
  • ...

Even when vectorized, any given A[i-1] will always be updated after value has been used to set B[i].

The second solution is to split the conflicting statements into two separate loops. The first loop performs all updates to B[i]. The second loop performs all updates to A[i-1]. Because all the B[] updates are completed before the A[] updates begin, the two loops can both be vectorized successfully.

  • Set B[1] to A[1]
  • Set B[2] to A[2]
  • Set B[3] to A[3]
  • ...
  • Modify A[0]
  • Modify A[1]
  • Modify A[2]
  • ...

Solution example

#define ARRAY_SIZE 3000

void inc_copy1(int *A, int *B, int n);
void inc_copy2(int *A, int *B, int n);

int main() {
  int A[ARRAY_SIZE];
  int B[ARRAY_SIZE];
  for (int j = 0; j < ARRAY_SIZE; j++) {
    A[j] = j;
    B[j] = j * 2;
  }

  for (int i = 0; i < 10000; i++) {
    inc_copy1(A, B, ARRAY_SIZE);
    inc_copy2(A, B, ARRAY_SIZE);
  }

  return 0;
}

// By reordering the conflicting statements, the backward dependency can be
// resolved
void inc_copy1(int *A, int *B, int n) {
  for (int i = 1; i < n; i++) {
    B[i] = A[i];
    A[i-1] = A[i-1] + 1;
  }
}

// After splitting the conflicting statements in two separate loops, both loops
// can be vectorized
void inc_copy2(int *A, int *B, int n) {
  for (int i = 1; i < n; i++)
    B[i] = A[i];

  for (int i = 1; i < n; i++)
    A[i-1] = A[i-1] + 1;
}