You copied the Doc URL to your clipboard.

Coding best practice for auto-vectorization

Describes some best practices to follow to optimize your code for auto-vectorization.

To produce optimal, auto-vectorized output, structure your code to provide hints to the compiler. A well-structured application with hints enables the compiler to detect features that it would otherwise not be able to detect. The more features the compiler detects, the more vectorized your output code is.

Use restrict

Use the restrict keyword if appropriate when using C/C++ code. The C99 restrict keyword (or the non-standard C/C++ __restrict__ keyword) indicates to the compiler that a specified pointer does not alias with any other pointers, for the lifetime of that pointer. restrict allows the compiler to vectorize loops more aggressively because it becomes possible to prove that loop iterations are independent and can be executed in parallel.

Note

C code might use either the restrict or __restrict__ keywords. C++ code must use the __restrict__ keyword.

If the restrict keywords are used incorrectly (that is, if another pointer is used to access the same memory) then the behavior is undefined. It is possible that the results of optimized code will differ from that of its unoptimized equivalent.

Use pragmas

The compiler supports pragmas. Use pragmas to explicitly indicate that loop iterations are completely independent from each other.

For more information, see Use pragmas to control auto-vectorization.

Use < to construct loops

Where possible, use < conditions, rather than <= or != conditions, when constructing loops. < conditions help the compiler to prove that a loop terminates before the index variable wraps.

If signed integers are used, the compiler might be able to perform more loop optimizations because the C standard allows for undefined behavior in the case of signed integer overflow. However, The C standard does not allow for undefined behavior in the case of unsigned integers.

Use the -ffast-math option

The -ffast-math option can significantly improve the performance of generated code, but it does so at the expense of strict compliance with IEEE and ISO standards for mathematical operations. Ensure that your algorithms are tolerant of potential inaccuracies that could be introduced by the use of this option.

Was this page helpful? Yes No