GCC 11 - Tuning for SVE coresHighlights
- Micro-architectural tuning for Fujitsu A64FX and Arm Neoverse V1 cores
- SVE intrinsics code-gen improvement
- Auto vectorization improvements
- Tune SVE code to specific cores
- -mcpu=a64fx compiles code specifically for Fujitsu A64FX cores and tunes SVE code for the A64FX micro-architecture.
- -mcpu=neoverse-v1 compiles code specifically for Arm Neoverse V1 cores and tunes SVE code for the Neoverse V1 micro-architecture.
- Arm strongly recommends using -mcpu=<core> if the target hardware is known.
- This allows GCC to use all of the available architecture extensions. It also tells GCC to optimize for the core's micro-architecture.
- If code is being built and run on the same machine, the easiest way of getting the best code is to use -mcpu=native.
- Use more SVE and SVE2 instructions for auto-vectorization, including:
- FCADD and FCMLA (SVE)
- CADD and CMLA (SVE2)
- Improve SVE instruction selection, including:
- Remove redundant PTEST instructions - This affects both PTESTs in auto-vectorized code and PTESTs created by svptest intrinsic functions. It significantly improves the code generated for calls to the svrdffr and svrdffr_z intrinsics, in cases where they are followed by an svptest of the result.
- Improve the addressing mode choices for the prefetch intrinsic functions (svprfb, svprfh, svprfw and svprfd).
- Improve support for “unpacked” integer vector operations, where integer vector elements are stored in wider containers.
- For example, if a loop is operating on both 32-bit and 64-bit integers, it is sometimes better to operate on 32-bit integers stored in 64-bit containers.