Check your knowledge

  • What is Neon?

    Neon is the implementation of the Advanced SIMD extension to the Arm architecture. All processors compliant with the Armv8-A architecture (for example, the Cortex-A76 or Cortex-A57) include Neon. In the programmer's view, Neon provides an additional 32 128-bit registers with instructions that operate on 8, 16, 32, or 64 bit lanes within these registers.

  • How do you enable Neon code generation with Arm Compiler?

    Target AArch64 with --target=aarch64-arm-none-eabi and specify a suitable optimization level, such as -O1 -fvectorize or -O2 and higher.

  • Suppose the Arm compiler automatically unrolls a loop to a depth of two. How would you force the compiler to unroll to a depth of four?

    #pragma clang loop interleave_count(4) will achieve this, applying only to that particular loop.

  • How can you best write source code to assist the compiler optimizations?

    Consider the following function when compiled with the -01 compiler option:

    float vec_dot(float *vec_A, float *vec_B, int len_vec) {
            float ret = 0;
            int i;
            for (i=0; i<len_vec; i++) {
                    ret += vec_A[i]*vec_B[i];
            }
            return ret;
    }

    You could make the following changes to assist the compiler optimizations:

    • Compile at -O2 or higher, or with -fvectorize.
    • Specify #pragma clang loop vectorize(enable) before the loop as a hint to the compiler.
    • Note that we are not modifying the vectors during the procedure so adding the restrict keyword will do nothing here; it doesn't matter if the input arrays overlap.
    • SLP vectorization comes with an increased code in this case. This may be acceptable depending on hardware limits and expected input array length.

    Here is the optimized source code:

    float vec_dot(float *vec_A, float *vec_B, int len_vec) {
            float ret = 0;
            int i;
            #pragma clang loop vectorize(enable)
            for (i=0; i<len_vec; i++) {
                    ret += vec_A[i]*vec_B[i];
            }
            return ret;
    }
Previous Next