Develop HPC applications on Arm

Developing applications on Arm architecture servers is scarcely different from developing on any other architecture. Most software packages used in HPC work out-of-the-box, and those that don't require only minor changes to your makefile. If you are developing code or porting and tuning existing code for Arm, there are a number of things you will want to know.

How to run common HPC applications on Arm

If you’d like to add your application or library to the list of known applications or find out what others are doing, visit the Arm HPC community.

About the Arm-v8A architecture

Arm-v8A (AArch64) is the 64-bit architecture used by server-class hardware and in most of today’s smartphones. The architecture and its instruction set are implemented by Arm cores and our partner cores, enabling applications compiled on one partner core to run transparently on another.

In most cases, developers will not need to have an in-depth knowledge of the Armv8-A architecture and can rely on compilers and other tools to hide the detail, but for those that do:

For those seeking to manually exploit the NEON SIMD vector units present in today’s server hardware, see NEON.

The Arm-v8A architecture continues to evolve, adding instructions that can accelerate code for specific usages or additional security capabilities.

Read more about the latest additions to the Arm-v8A architecture

Compilers and languages

The common programming languages are well-supported on Arm – with most open-source tools available in packages provided by your Linux distribution. Commercial compilers for C++, C and Fortran are available from Arm in the Arm Allinea Studio.

The Arm commercial and GNU open-source compilers are tuned extensively for Arm servers and partner silicon, and are evolving rapidly.  The highest performance is achieved using the most recent versions of these tools – which are not normally the default for Linux distributions. Read about some of this work in GNU GCC 8 and glibc 2.27.

More information about compilers and languages

Math libraries

The common libraries used by applications in HPC are available on Arm. Arm Performance Libraries provide optimized BLAS and FFT implementations that have been extensively tuned for Arm and partner cores, and are available as part of Arm Allinea Studio.  Read our recent blog for more information on the latest updates to Arm Performance Libraries.

Explore the wide range of math libraries available on Arm

The GNU libm library (part of libc) is evolving quickly on Arm too – and your distribution may not be offering the best performance to date.  If the latest libc and libm cannot be installed, a collection of optimized routines is available here.

Development tools

The debugging tools that you already know for HPC are also available on Arm.

Explore commercial and open-source debugging tools

A wide range of profiling tools are available to help you tune and optimize your application on Arm.

Explore commercial and open-source profiling tools

Advanced performance

Default compiler optimizations and tuned libraries will be sufficient for most users to get the best performance on Arm, but some applications benefit from providing compilers with additional help. 

Today’s Arm systems have 128-bit NEON SIMD units which provide floating-point performance.  

NEON

Whilst the Arm Allinea Studio and GNU compilers both try to use the NEON units, they do not always succeed. Both GNU and Arm Allinea Studio compilers provide vectorization reports that describe why optimizations have not been possible. 

To generate vectorization reports in Arm Allinea Studio, refer to:

Developers can explicitly help compilers in this task by:

More information about NEON

Scalable vector extension (SVE) for AArch64

Many future systems will have SVE units to provide vectorization. SVE is a vector extension for AArch64 execution mode for the A64 instruction set of the Armv8 architecture. Unlike other SIMD architectures, SVE does not define the size of the vector registers, but constrains it to a range of possible values, from a minimum of 128 bits up to a maximum of 2048 in 128-bit wide units. Therefore, any CPU vendor can implement the extension by choosing the vector register size that better suits the workloads the CPU is targeting. The design of SVE guarantees that the same program can run on different implementations of the instruction set architecture without the need to recompile the code.

Arm has developed tools to enable users to experiment with and prepare for SVE. Compile for SVE with Arm's C/C++ or Fortran compilers, and Run SVE binaries on existing Armv8-A hardware with Arm Instruction Emulator

Learn more about SVE