## Arm Code Advisor insight: Cannot vectorize due to unsafe memory dependencies

In this example, a loop cannot be vectorized because of unsafe memory dependencies in the loop body. Try using `#pragma loop distribute(enable)` to allow loop distribution to attempt to isolate the offending operations into a separate loop.

Note:
Insights are only supported with Arm Compiler for HPC versions 18.4.2 and earlier.

## Insight example

In the example `inc_copy` loop, there is a backwards memory dependency between the two arrays `A[]` and `B[]`. The first operation updates `A[i-1]`, while a subsequent operation sets `B[i]` to `A[i]`.

```#define ARRAY_SIZE 3000

void inc_copy(int *A, int *B, int n);

int main() {
int A[ARRAY_SIZE];
int B[ARRAY_SIZE];
for (int j = 0; j < ARRAY_SIZE; j++) {
A[j] = j;
B[j] = j * 2;
}

for (int i = 0; i < 10000; i++) {
inc_copy(A, B, ARRAY_SIZE);
}

return 0;
}

void inc_copy(int *A, int *B, int n) {
for (int i = 1; i < n; i++) {
A[i-1] = A[i-1] + 1;
B[i] = A[i];
}
}```

Running the code sequentially there is no problem, because `A[i-1]` is always modified in the loop iteration after it was used to set `B[n]`.

That is, the scalar loop does the following:

• Modify `A[0]`
• Set `B[1]` to `A[1]`
• Modify `A[1]`
• Set `B[2]` to `A[2]`
• Modify `A[2]`
• Set `B[3]` to `A[3]`
• ...

However, if the code were vectorized it would be possible (even likely!) that `A[i-1]` would be modified before it was used to set `B[i]`. That is, a 2-lane vectorized loop would do the following:

• Modify `A[0]` AND `A[1]`
• Set `B[1]` to `A[1]` AND `B[2]` to `A[2]` (Note that `B[1]` is set to `A[1]` after `A[1]` has been modified!)
• Modify `A[2]` AND `A[3]`
• Set `B[3]` to `A[3]` AND `B[4]` to `A[4]` (Note that `B[3]` is set to `A[3]` after `A[3]` has been modified!)
• ...

## Solution description

There are two ways to solve this problem.

The first solution is to reorder the statements in the loop. If `B[n]` is always set to `A[n]` before `A[n-1]` is modified, the backwards dependency is removed.

• Set `B[1]` to `A[1]`
• Modify `A[0]`
• Set `B[2]` to `A[2]`
• Modify `A[1]`
• Set `B[3]` to `A[3]`
• Modify `A[2]`
• ...

Even when vectorized, any given `A[i-1]` will always be updated after value has been used to set `B[i]`.

The second solution is to split the conflicting statements into two separate loops. The first loop performs all updates to `B[i]`. The second loop performs all updates to `A[i-1]`. Because all the `B[]` updates are completed before the `A[]` updates begin, the two loops can both be vectorized successfully.

• Set `B[1]` to `A[1]`
• Set `B[2]` to `A[2]`
• Set `B[3]` to `A[3]`
• ...
• Modify `A[0]`
• Modify `A[1]`
• Modify `A[2]`
• ...

## Solution example

```#define ARRAY_SIZE 3000

void inc_copy1(int *A, int *B, int n);
void inc_copy2(int *A, int *B, int n);

int main() {
int A[ARRAY_SIZE];
int B[ARRAY_SIZE];
for (int j = 0; j < ARRAY_SIZE; j++) {
A[j] = j;
B[j] = j * 2;
}

for (int i = 0; i < 10000; i++) {
inc_copy1(A, B, ARRAY_SIZE);
inc_copy2(A, B, ARRAY_SIZE);
}

return 0;
}

// By reordering the conflicting statements, the backward dependency can be
// resolved
void inc_copy1(int *A, int *B, int n) {
for (int i = 1; i < n; i++) {
B[i] = A[i];
A[i-1] = A[i-1] + 1;
}
}

// After splitting the conflicting statements in two separate loops, both loops
// can be vectorized
void inc_copy2(int *A, int *B, int n) {
for (int i = 1; i < n; i++)
B[i] = A[i];

for (int i = 1; i < n; i++)
A[i-1] = A[i-1] + 1;
}```