Arm Code Advisor insight: Cannot vectorize due to unsafe memory dependencies

In this example, a loop cannot be vectorized because of unsafe memory dependencies in the loop body. Try using #pragma loop distribute(enable) to allow loop distribution to attempt to isolate the offending operations into a separate loop.

Note:
Insights are only supported with Arm Compiler for HPC versions 18.4.2 and earlier.

Insight example

In the example inc_copy loop, there is a backwards memory dependency between the two arrays A[] and B[]. The first operation updates A[i-1], while a subsequent operation sets B[i] to A[i].

#define ARRAY_SIZE 3000

void inc_copy(int *A, int *B, int n);

int main() {
int A[ARRAY_SIZE];
int B[ARRAY_SIZE];
for (int j = 0; j < ARRAY_SIZE; j++) {
A[j] = j;
B[j] = j * 2;
}

for (int i = 0; i < 10000; i++) {
inc_copy(A, B, ARRAY_SIZE);
}

return 0;
}

void inc_copy(int *A, int *B, int n) {
for (int i = 1; i < n; i++) {
A[i-1] = A[i-1] + 1;
B[i] = A[i];
}
}

Running the code sequentially there is no problem, because A[i-1] is always modified in the loop iteration after it was used to set B[n].

That is, the scalar loop does the following:

• Modify A
• Set B to A
• Modify A
• Set B to A
• Modify A
• Set B to A
• ...

However, if the code were vectorized it would be possible (even likely!) that A[i-1] would be modified before it was used to set B[i]. That is, a 2-lane vectorized loop would do the following:

• Modify A AND A
• Set B to A AND B to A (Note that B is set to A after A has been modified!)
• Modify A AND A
• Set B to A AND B to A (Note that B is set to A after A has been modified!)
• ...

Solution description

There are two ways to solve this problem.

The first solution is to reorder the statements in the loop. If B[n] is always set to A[n] before A[n-1] is modified, the backwards dependency is removed.

• Set B to A
• Modify A
• Set B to A
• Modify A
• Set B to A
• Modify A
• ...

Even when vectorized, any given A[i-1] will always be updated after value has been used to set B[i].

The second solution is to split the conflicting statements into two separate loops. The first loop performs all updates to B[i]. The second loop performs all updates to A[i-1]. Because all the B[] updates are completed before the A[] updates begin, the two loops can both be vectorized successfully.

• Set B to A
• Set B to A
• Set B to A
• ...
• Modify A
• Modify A
• Modify A
• ...

Solution example

#define ARRAY_SIZE 3000

void inc_copy1(int *A, int *B, int n);
void inc_copy2(int *A, int *B, int n);

int main() {
int A[ARRAY_SIZE];
int B[ARRAY_SIZE];
for (int j = 0; j < ARRAY_SIZE; j++) {
A[j] = j;
B[j] = j * 2;
}

for (int i = 0; i < 10000; i++) {
inc_copy1(A, B, ARRAY_SIZE);
inc_copy2(A, B, ARRAY_SIZE);
}

return 0;
}

// By reordering the conflicting statements, the backward dependency can be
// resolved
void inc_copy1(int *A, int *B, int n) {
for (int i = 1; i < n; i++) {
B[i] = A[i];
A[i-1] = A[i-1] + 1;
}
}

// After splitting the conflicting statements in two separate loops, both loops
// can be vectorized
void inc_copy2(int *A, int *B, int n) {
for (int i = 1; i < n; i++)
B[i] = A[i];

for (int i = 1; i < n; i++)
A[i-1] = A[i-1] + 1;
}