Arm Compiler for Linux: what is new in the 22.0 release?
Arm Compiler for Linux 22.0 is now available with performance improvements and support for new hardware like AWS Graviton 3.
By Ashok Bhat

Arm Compiler for Linux 22.0 is now available with improved compilers and libraries. Arm Compiler for Linux (ACfL) is a combination of Arm C/C++ Compiler (armclang), Arm Fortran Compiler (armflang), and Arm Performance Libraries (ArmPL). In this blog, we explore what is new in this release.
Arm Compilers now based on LLVM 13
Arm Compilers are now based on LLVM 13, and this has resulted in performance improvements.

We see many sub-benchmarks of SPEC CPU 2017 improve, with an overall geomean score of 2.2% over the previous release of 21.1. The benchmark was run on an AWS c6g.metal instance (with Arm Neoverse-N1 core).
Better tuned for Neoverse-V1 (core in AWS Graviton 3)
Arm Compilers in 22.0 feature a tuned cost model for Neoverse-V1 and many SVE code-gen related improvements. This includes (1) optimal usage of the Gather/Scatter feature of SVE (2) aligning loops with padding to make better use of instruction cache (3) using SVE splice operation optimally when inserting one element of a vector into another.

The cumulative effect of these optimizations can be seen in the previous graph. We are comparing here the SVE code tuned for Neoverse-V1 to the Neon code tuned for Neoverse-V1. Our benchmark is a set of representative micro-benchmarks used when developing the SVE architecture extension. You can see that the compilers in 22.0 (orange bar) outperform version 21.1 (blue bar). With these improvements, the 22.0 release is ready for the development of HPC applications on AWS Graviton 3.
GCC 11 update
The package now ships GCC 11 series of compilers, with many performance improvements.
Single ArmPL with runtime detection of CPU
Arm Performance Libraries are no longer packaged with separate libraries for SVE and non-SVE cores. We now provide a single library, which contains optimized versions for all supported cores, including SVE. At run time, the library detects the type of core and chooses the most optimal routines and configuration. As a user, you can automatically benefit from the fastest tunings within the library, without the need to re-link to a core-specific library.
Faster BLAS, LAPACK, and FFT
ArmPL 22.0 comes with further improvements in BLAS and LAPACK routines.
| API | Improvements |
| BLAS Level 1 | SVE optimizations for ?COPY, ?SCAL, ?AXPY |
| BLAS Level 2 | Packed and banded functionality; ?TRMV and ?TRSV for large problems |
| BLAS Level 3 | ?TRMM and ?TRSM for large problems |
| LAPACK | ?EEVD (eigenvalue decomposition) for small problems; ?POTRF for multithreaded cases |

The previous graph shows improvements in 22.0 over 21.0 (released in early 2021). The data is from benchmarks of over 5000 individual cases, covering: benchmarks across the wide set of BLAS routines, a selection of important LAPACK routines, for small O(10), medium O(100) and large O(1000) problem sizes, in both serial (1 thread) and parallel (8 threads) execution.
Improvements in math functions
In 22.0, we have improved the performance of many math functions. These include improvements in scalar functions (atan, atan2, atan2f, cos, exp, sin and erf) and vector functions (atanf, atan2f, cosf, erfcf, expo, logf, pow, sinf and tanf). In the following graph, you can see the impact when Elefunt benchmark is run on an AWS Graviton 2 (Neoverse N1) system.

Module name changes
The package provides module files to easily load the required compiler or libraries. With the 22.0 release, please use the following module commands.
| Environment | module load command |
| Arm C/C++/Fortran Compilers | module load acfl/22.0 |
| Arm Performance Libraries | module load armpl/22.0 |
| GNU compilers | module load gnu/11.2.0 |
Conclusion
Arm Compiler for Linux 22.0 brings many improvements and changes over the previous 21.x series. We continue to make further improvements and plan to provide the next release 22.1 in Sep/Oct 2022.
By Ashok Bhat
Re-use is only permitted for informational and non-commerical or personal use only.
