Release History

This page lists the Arm Allinea Studio release history.

To download and install the latest version of Arm Allinea Studio, see our downloads page and follow the installation steps given on the download page.

Details on Release versions and links to the Release Notes and Documentation of Arm C/C++ Compiler, Arm Fortran Compiler, and Arm Performance Libraries are provided below.

Arm Allinea Studio also includes Arm Forge (Release History) and Arm Performance Reports (Release History). 

For more compatibility information, see our supported platforms topic.

Arm Allinea Studio

Version 19.2

Released: June 07, 2019

  • Arm Allinea Studio: 19.2 June 07, 2019

    What's new in 19.2

    Arm C/C++/Fortran Compiler 19.2

    New features and enhancements:

    • D-866 : The -insights flag is no longer supported.
    • D-771 : Added experimental scheduler improvements that can give performance benefits on large processors, such as ThunderX2.  By default, the scheduler improvements are disabled. To enable them, include the "-mllvm -misched-favour-latency=true" option at compile time.
    • D-612 : The Fortran 2008 {{ERROR STOP}} statement is now supported.

    Bug fixes:

    • H-629 : Fixes a problem where -armpl did not locate the correct include directory.
    • H-585 : Fixes a problem that caused images that were compiled on ThunderX2 platforms but that targeted other platforms to fail when run on those other platforms.
    • H-538 : Fixes a problem that caused a compiler error when using OpenMP Taskloop.
    • H-536 : The performance of the Fortran BACKSPACE statement has been improved.
    • H-527 : Issues using Fortran ISO C-bindings and Arm Fortran Compiler are now resolved.
    • H-525 : Performance of OpenMP ATOMIC in Fortran has been enhanced by using native atomic load/store instructions where possible.
    • H-508 : armflang now correctly displays the source location for vectorisation reports generated by -Rpass, when compiling without using the -g option.
    • H-401 : The compiler now correctly infers template types for SVE datatypes like 'svfloat32_t'.

    Arm Performance Libraries 19.2.0

    New features and enhancements:

    • D-746 : A new library, libastring, is included by Arm Compiler by default. This library provides optimized versions of a number of common string functions, such as memcpy and memset. libastring is also provided for the GCC compiler, and can be found in $ARMPL_DIR/lib.
    • D-676 : A number of FFT performance improvements have been implemented, especially in single-precision.
    • D-673 : libamath performance improvements including vectorized versions of sin, cos, exp, and log, in both single and double precision.
    • D-671 : Half precision interfaces have been added to libarmpl for matrix-matrix multiplication and FFTs.

      The half precision matrix-matrix multiplication function is called hgemm_. This interface follows the usual *GEMM interface with half precision matrices and floating point scalars.

      The naming scheme for the FFTW interfaces has been extended, such that all functions are prefixed fftwh_. An example of how to use these functions would be based upon:
                /* Include Arm Performance Libraries FFT interface. Make sure you include the header file provided by Arm PL and not the header provided by FFTW3.*/
                #include "fftw3.h"
                /* Declare half-precision arrays to be used */
                __fp16 *in;             fftwh_complex *out;             fftwh_plan plan;
                /* Plan, execute and destroy */
                plan =             fftwh_plan_many_dft_r2c(...);             fftwh_execute(plan);             fftwh_destroy_plan(plan);
    Bug fixes:
    • None in this release.

    Arm Forge 19.1

    Arm DDT new features and enhancements:

    • Support for Arm C/C++/Fortran Compiler up to version 19.2.
    • Fixed an issue where GDB 8.1 would not start on an Ubuntu 16.04 system without libmpfr installed.
    • Support for debugging of IBM Spectrum MPI jobs launched with Spindle.
    • GDB 8.1 is now the default DDT debugger.
    • Support for the GDB 7.10.1 debugger has been removed.
    • Memory Debugging support for PMDK.
    • Support for debugging CUDA 10.0 and 10.1 binaries.
    • Remote connect network traffic is now compressed by default so some actions will now be faster when using this feature.

    Arm DDT bug fixes:

    • [FOR-7342] Fixed an issue with memory debugging aligned allocations.
    • [FOR-6659] Clarified information in the user guide about startup issues with OpenMPI 3.0 and 3.1.
    • [FOR-6503] Fixed an issue where variables named "array" in a struct were not evaluated.
    • [FOR-6142] Fixed an issue with memory debug, where the total number of free calls were double counted when using memkind_realloc.
    • [FOR-6049] Fixed an issue with remote client messages when X11 is not available.
    • [FOR-7236] Fixed an issue where MPI auto-detection did not work with HPE MPT 2.18+.
    • [FOR-7195] Fixed an issue that occurs when output brackets are present in the output file argument.

    Arm MAP new features and enhancements:

    • Support for Arm C/C++/Fortran Compiler up to version 19.2.
    • Improved GUI performance and reduced memory consumption when viewing large .map files.
    • New CPU metrics for Armv8-A platforms.
    • New CPU metrics for IBM Power9 platforms.
    • MAP now displays stacks from Python code on non-main threads.
    • Architecture information is now stored to the generated .map file.
    • CPU metrics on Power and Armv8-A are now available with a standard Arm Forge license.
    • Support for displaying Caliper instrumented regions (https://github.com/LLNL/Caliper) to Arm MAP. Refer to section 32, 'Performance Analysis with Caliper Instrumentation', in the Arm Forge user guide.
    • Section 24.1 in the Arm Forge user guide has been updated to better describe the CPU instruction metrics available on x86_64, Armv8-A and IBM Power 8 and Power 9 platforms.
    • Remote connect network traffic is now compressed by default so some actions will now be faster when using this feature.

    Arm MAP bug fixes:

    • [FOR-6642] Improved unwinding for PGI-compiled binaries on IBM Power systems.
    • [FOR-6414] Fixed an issue that occurs when profiling applications that were statically compiled by the PGI compiler.
    • [FOR-6659] Clarified information in the user guide about startup issues with OpenMPI 3.0 and 3.1.
    • [FOR-5518] Fixed an issue that caused a slowdown of the analysis phase when profiling Python scripts.
    • [FOR-7236] Fixed an issue where MPI auto-detection did not work with HPE MPT 2.18+.
    • [FOR-7195] Fixed an issue that occurs when output brackets are present in the output file argument.

    Arm Performance Reports 19.1

    New features and enhancements:

    • Support for Arm C/C++/Fortran Compiler up to version 19.2.
    • Architecture information is now stored to the generated .map file.
    • New CPU metrics for IBM Power9 and Armv8-A platforms.

    Bug fixes:

    • [FOR-6642] Improved unwinding for PGI-compiled binaries on IBM Power systems.
    • [FOR-6414] Fixed an issue that occurs when profiling applications that were statically compiled by the PGI compiler.
    • [FOR-6659] Clarified information in the user guide about startup issues with OpenMPI 3.0 and 3.1.
    • [FOR-7236] Fixed an issue where MPI auto-detection did not work with HPE MPT 2.18+.
    • [FOR-7195] Fixed an issue that occurs when output brackets are present in the output file argument.
    • Release Note
    • EULA
  • Arm Allinea Studio: 19.1 March 08, 2019

    What's new in 19.1

    New features and enhancements

    Arm C/C++/Fortran Compiler 19.1:
      - D-669 : Arm Compiler for HPC now supports the Fortran 'TRAILZ' intrinsic, which finds the number of trailing zero bits in an integer.  Please refer to the Fortran Reference Guide for more information.

      - D-668 : Arm Compiler for HPC now supports the Fortran 'UNROLL' directive.  This is a hint to the compiler to unroll the preceding loop.  Please refer to the Fortran Reference Guide for more information.

      - D-632 : A new flag -fno-realloc-lhs has been added, for consistency with GNU compilers. Use this flag in place of -Mallocatable=95, which is no longer documented but is still supported. Refer to the Fortran Reference Guide for information about this flag.

      - D-552 : libamath is now the default library used by Arm Compiler for HPC to provide optimized scalar and vector math functions.

                - The compiler will link to libamath by default before libm in order to provide better performing implementations.

                - libamath is also provided for GCC. GCC users must link to the library explicitly to make use of the optimized math functions.

                - Always use the correct build of libamath for the compiler you are using. For example, do not compile and link code with GCC using the version of libamath supplied for Arm Compiler for HPC, use the GCC version.

      - D-513 : Arm Compiler for HPC now supports the Huawei Kunpeng 920 CPU.  Tuning for Kunpeng 920-based platforms is automatically selected with the -mcpu=native option, when the compiler is run on a Kunpeng 920-based platform. To select this explicitly, use the -mcpu=tsv110 option.


    Arm Performance Libraries 19.1.0:
      - D-581 : Improved *GEMV performance.

      - D-552 : libamath is now the default library used by Arm Compiler for HPC to provide optimized scalar and vector math functions.

                - The compiler will link to libamath by default before libm in order to provide better performing implementations.

                - libamath is also provided for GCC. GCC users must link to the library explicitly to make use of the optimized math functions.

                - Always use the correct build of libamath for the compiler you are using. For example, do not compile and link code with GCC using the version of libamath supplied for Arm Compiler for HPC, use the GCC version.

      - D-499 : Performance improvements for [SCZ]GEMM, including stabilized performance for ThunderX2 systems configured in SMT > 1 mode.

      - D-498 : Improved MPI FFT parallel scaling.

      - D-497 : Support for Fortran MPI FFTW interface.

      - D-496 : FFT performance improvements, including for input lengths involving large prime
                factors.

      - D-495 : Single precision real SpMV performance optimizations.

      - D-494 : SpMV support for Compressed Sparse Column (CSC) and Coordinate (COO) sparse matrix formats with both C and Fortran interfaces.

      - D-493 : Added sparse matrix-vector multiplication (SpMV) interfaces for Fortran, including an example.

    Bug fixes

    Arm C/C++/Fortran Compiler 19.1:
      - H-489 : The armflang runtime library no longer exposes symbols that conflict with libnuma.

      - H-464 : A problem that occurs when a shared variable is accessed in a taskloop has now been fixed.

      - H-400 : Fixed an issue where getting the member of sizeless struct rvalue prevented successful compilation.

      - H-397 : RPMs and debs now correctly report what libraries they provide.

      - H-392 : The runtime performance of the Fortran TRANSFER function has been improved.

      - H-296 : Fixed a runtime segmentation fault in subroutines that contain OMP CRITICAL and have one or more ENTRY statements.

      - H-98 : The install should now be properly relocatable on RPM-based platforms and will register with the system RPM database if the user has appropriate permissions

      - H-59 : Fixed an issue where when the DATA statement was used to assign a value to a Cray pointer, the compiler aborted with the following message "Error: integer constant must have integer type".

    Arm Performance Libraries 19.1.0:
      - No fixed issues

    Known Issues

    Arm C/C++/Fortran Compiler 19.1:
      - H-571 : If you have multiple versions of Arm Compiler for HPC installed that depend on the same GCC version, running the uninstall.sh script will fail. Instead, remove the packages manually, using the Package Manager, or modify the uninstall.sh script to prevent removal of the GCC package.

      - H-421 : When the uninstaller is run, it does not remove all of the files. It is safe to remove the remaining files manually.

      - H-411 : There is a regression in SVE vectorization which may result in miscompiles of loops with loop-carried dependencies.

      - H-310 : -fsimdmath is incompatible with a dynamic linker optimization known as 'lazy binding'.  When using -fsimdmath, Arm recommends that you also add '-z now' to the compile/link flags, in order to disable this optimization during linking. For more information, see Vector math routines.

    Arm Performance Libraries 19.1.0:
      - No known issues

     

    • Release Note
    • EULA
  • Arm Allinea Studio: 19.0 November 02, 2018

    What's new in 19.0

    New features and enhancements

    Arm C/C++/Fortran Compiler 19.0:

    • D-545 : Partial support for the do concurrent Fortran 2008 feature. Partial support because serial code is generated.
    • D-544 : Support for the submodules Fortran 2008 feature.
    • D-394 : Improvements to the performance of Fortran NINT and DNINT intrinsics.
    • D-393 : Improvements to the performance of Fortran math intrinsics, including the ability to auto-vectorize scalar math intrinsics. To benefit from these improvements, add the new compiler option -armpl to your compile and link arguments, and use optimization level -O2 or higher.
    • D-388 : Arm Compiler for HPC is now based on LLVM 7.0.
    • D-374 : Support for the Fortran 'NOVECTOR' directive, which enables users to disable auto-vectorization on individual loops.
    • D-373 : Support for the Fortran 'VECTOR ALWAYS' directive, which enables a user to request that a loop be auto-vectorized, irrespective of the compiler's internal cost-model, if it is safe to do so.
    • D-329 : A new C/C++ Compiler Reference Manual is available in <install_location>/<package_name>/share.

    Arm Performance Libraries 19.0:

    • D-492 : Various changes to C header files:
      • BLAS, CBLAS and LAPACK function prototypes have been modified to use 'const' where appropriate, for example, for input array pointers and char * specifiers.
      • We now use C-style _Complex numbers instead of our own structure for complex numbers in the armpl.h header. If required, you can use #define to override armpl_singlecomplex_t and armpl_doublecomplex_t to something else that is bitwise-compatible (e.g. C++ std::complex type). This change is bitwise-compatible with the structure we have replaced.
      • Complex number manipulation functions have been removed from the header. You are advised to use standard C-style _Complex operations instead (or those appropriate to any redefinition such as C++ std::complex).
      • cdotc_, cdotu_, zdotc_, zdotu_, cladiv_ and zladiv_ prototypes have been modified to reflect the correct C-to-Fortran calling convention for a given compiler toolchain.
    • D-486 : Libraries tuned for Qualcomm Falkor are no longer provided.
    • D-461 : The GCC version of the library is now compatible with GCC 8.2 (previously 7.1).
    • D-430 : Enhancements to existing libamath functions.
    • D-429 : Support for LAPACK version 3.8.0.
    • D-428 : LAPACK parallel scalability tuning has been performed for the following routines on ThunderX2CN99 systems: *POTRF, *GEQRF, *GETRF.
    • D-426 : The FFT interface documented in the Arm Performance Libraries User Manual versions up to v18.4.0 has been deprecated. Users are instead encouraged to use the FFTW interface within Arm Performance Libraries for best performance. This release also includes optimizations to key FFT kernels.
    • D-425 : Added FFTW MPI single and double precision interfaces in C.
    • D-424 : Execution of advanced and guru FFTW plans is now parallelized.
    • D-423 : Added FFTW guru single and double precision interfaces in C and Fortran.
    • D-422 : Added a new suite of sparse matrix routines in C supporting sparse matrix-vector multiplication supplied in Compressed Sparse Row format, including an optimized double-precision real kernel. Added WAXPBY BLAS extension routine (w = a*x + b*y, for vectors w, x and y and scalars a and b).
    • D-421 : Performance enhancements to parallel DGEMM, especially for small to medium-sized problems.

    Bug fixes

    Arm C/C++/Fortran Compiler 19.0:

    • H-423 : Support, by default, for Fortran 2003 semantics for assignments to allocatable variables.
    • H-407 : In some corner cases there has been an increase in memory usage observed due to the switch to memory allocatable semantics of Fortran 2003. This can result in a segfault. In these cases the recommended workaround is to use the armflang option -Mallocatable=95 during compilation.
    • H-361 : Arm Fortran Compiler now handles the -fsave-optimization-record flag correctly.
    • H-333 : Improvements to DWARF source-level debug information for Fortran.
    • H-130 : Added missing man page for armclang++.
    • H-96 : Fixes an issue with armflang's handling of OpenMP 'threadprivate' module variables.

    Arm Performance Libraries 19.0:

    • No fixed issues

    Refer to the Release Notes for further information about this release.

    • Release Note
    • EULA
  • Arm Allinea Studio: Version 18.4 - latest update 18.4.2 October 10, 2018

    What's new in Version 18.4 - latest update 18.4.2

    Arm Compiler for HPC 18.4 covers the following releases:

    • Arm C/C++/Fortran Compiler and Arm Performance Libraries version 18.4 - released 26th July 2018.
    • Arm C/C++/Fortran Compiler and Arm Performance Libraries version 18.4.1 - released 7th September 2018.
    • Arm C/C++/Fortran Compiler and Arm Performance Libraries version 18.4.2 - released 10th October 2018.

    New features and enhancements

    Arm C/C++/Fortran Compiler 18.4:

    • The -fstack-arrays option is enabled at the -Ofast optimization level.
    • Arm Fortran Compiler now supports the general-purpose ivdep directive, and partially supports the OpenMP-specific omp simd directive. These directives instruct the compiler to ignore memory dependencies and can enable a loop to be vectorized.
    • The Arm Fortran Compiler Reference guide is now available in /opt/arm/<package_name>/share.
    • The new vector procedure call standard has been implemented and is used by the SLEEF math library.

    Arm C/C++/Fortran Compiler 18.4.1:

    • No new features or enhancements.

    Arm C/C++/Fortran Compiler 18.4.2:

    • No new features or enhancements.

    Arm Performance Libraries 18.4:

    • Performance improvements for batched CGEMM and ZGEMM.
    • Performance improvements for small-to-medium-sized SGEMM problems.
    • Significantly less time spent planning FFTW transforms for levels of rigor greater than FFTW_ESTIMATE.
    • Performance enhancements for complex-to-real FFTW transforms, especially multidimensional problems.
    • Libraries for Cortex-A57 and Cavium ThunderX are no longer provided.
    • New functions in libamath: sinf, cosf, sincosf (single precision).
    • Updated functions in libamath: exp, pow, log (double precision).

    Arm Performance Libraries 18.4.1:

    • No new features or enhancements.

    Arm Performance Libraries 18.4.2:

    • D-490 : The version of libamath built has been updated.

    Bug fixes

    Arm C/C++/Fortran Compiler 18.4:

    • H-52: Segfault on large array allocation.
    • H-87: Arm Fortran Compiler not vectorizing a loop.
    • H-92: Fixes for some debug issues related to subroutine arguments.
    • H-105: Flag (-E) to run only the preprocessor does not work in the Fortran compiler.
    • H-115: Problems with direct access I/O in Fortran programs.

    Arm C/C++/Fortran Compiler 18.4.1:

    • H-317: In the previous release, when armflang was used to link objects without compilation, it generated unnecessary warnings about unused compilation flags. These warnings have been removed.
    • H-149: A problem caused by unaligned offsets in stack layout for SVE replicating loads, which caused the compiler to crash, has been resolved.

    Arm C/C++/Fortran Compiler 18.4.2:

    • H-361 : Arm Fortran Compiler now handles the -fsave-optimization-record flag correctly.

    Arm Performance Libraries 18.4:

    • H-126: Some multidimensional FFTW transforms return incorrect results.

    Arm Performance Libraries 18.4.1:

    • No fixed issues

    Arm Performance Libraries 18.4.2:

    • No fixed issues

    Refer to the Release Note for further details.

    • Release Note
    • EULA
  • Arm Allinea Studio: 18.3 May 29, 2018

    What's new in 18.3

    New features and enhancements

    • Arm C/C++/Fortran Compiler 18.3:
      • Support for Fortran 2008 feature : Pointers to internal procedure and internal procedure passed as argument.
      • Automatic arrays can be allocated on the stack with -fstack-arrays flag.
    • Arm Performance Libraries 18.3:
      • Support for FFTW wisdom included for the first time.
      • Performance enhancements to FFTW functions: complex-to-complex and real-to-complex functions using both basic and advanced interfaces; some complex-to-real performance differences too.
      • Parallel performance improvements for S/D/C/ZTRSV and S/DTRMV.
      • New library, 'libamath', in the 'lib' directory for each microarchitecture for Arm Compiler builds.  This contains optimized versions of exp, pow and log functions in single and double precision.  For more information on libamath, see Getting started with Arm Performance Libraries.

    Bug fixes

    • Arm HPC Compiler 18.3:
      • H-14 : Fixed two preprocessor issues. Transpose intrinsic is now supported during initialization.
      • H-58 : Fixed failure when a module variable was used to set real kind in two different functions.
      • H-61 : __ARM_ARCH macro is now defined in armflang compiler.
      • H-74 : Disabled generation of fmas at O0 in armflang. Matches armclang behaviour. Passes fp accuracy tests at O0.
      • H-86 : Fixed issue with capturing procedure pointers to OpenMP parallel regions, which was preventing the TeaLeaf mini-app from running correctly.
    • Arm Performance Libraries 18.3:
      • H-7: Nested parallelism performance improvements.

    Known Issues

    Arm Compiler 18.3:

    • H-105: The '-E' option to armflang does not work. This will be fixed in the next release.
    • H-114: Debugging arrays with negative lower bounds is not currently supported.

     

    • Release Note
    • EULA
  • Arm Allinea Studio: 18.2 March 22, 2018

    What's new in 18.2

    Arm Compiler for HPC contains the following packages:

    • Arm Compiler v18.2
    • Arm Performance Libraries v18.2
    • GNU GCC 7.1

    New features and enhancements

    Arm C/C++/Fortran Compiler 18.2:

    • License management is now switched on by default. Please refer to Arm Allinea Studio licensing for more information about licensing.

    • SIMD math library 'libsimdmath.so' now provides the same set of functions for targeting Vector Length Agnostic (VLA) SVE instructions as it provides for ARM Advanced SIMD instructions. For example, a loop invoking 'double sin(double)' can be auto-vectorized with calls to a VLA implementation of 'sin', which is provided in 'libsimdmath.so'.
      'libsimdmath.so' has increased coverage of vectorized routines from math.h and GLIBC math.h.
      Please refer to Vector math routines for more information about this feature.

    • Debug information has been added for Fortran adjustable arrays and imported modules.

    Arm Performance Libraries 18.2:

    • FFT performance improvements. Improvements have been made to a selection of FFTW routines in the library. Users should see enhanced performance for a wide range of transform sizes for 1D complex-to-complex transforms in single and double precision via the basic interface. Improvements have also been made to the advanced interfaces for complex-to-complex, real-to-complex and complex-to-real transforms in single and double precision for transforms of any dimensionality. From this release users are advised to target the FFTW interface in Arm Performance Libraries rather than the FFT routines documented in the Arm Performance Libraries Reference Manual.

    • Thread tuning for level 1 BLAS routines *AXPY, *AXPBY, *SCAL, *COPY. Where possible the number of threads used for these routines may be throttled, compared with the number of threads requested, in order to improve performance.

    Refer to the Release Note for details of bug fixes and further information.

    • Release Note
    • EULA
    • Documentation
  • Arm Allinea Studio: 18.1 January 17, 2018

    What's new in 18.1

    Arm Compiler for HPC contains the following packages:

    • Arm Compiler v18.1
    • Arm Performance Libraries v18.1
    • GNU GCC 7.1

    New features and enhancements

    This release contains the following new features and enhancements:

    Arm Compiler 18.1

    Redhat 7 support is now provided as a single package, rather than having individual packages for each point release

    Compiler flag documentation (output with --help, the armflang manpage and the online documentation) have been simplified, by no longer documenting PGI-style Fortran flags when these flags have an exact GCC-style equivalent flag. Although no longer documented, the PGI-style flags are still supported as in previous releases.

    A new flag -fsimdmath enables vectorization of some scalar libm functions, by automatically replacing calls to these functions with a vectorized form inside of vectorized loops.  These vectorized forms are included in a new library (libsimdmath.so), which is included in the release and automatically linked in during compilation.

    License management for Arm Compiler is available as a default-off feature for beta testing.  If you wish to try this feature in your environment, please contact your Arm representative.

    The OpenMP runtime library (libomp.so) has been improved for platforms supporting the ARMv8.1-a architecture. Two versions of this library are included with the release, with the most appropriate library selected automatically.

    Debug information has now been enabled for module variables. With this change, users can now print/access these variables whilst debugging. We also generate debug information for modules even if they contain variables only.

    Arm Performance Libraries 18.1

    Optimizations for very small double precision real matrix-matrix multiplication, improving DGEMM and DGEMM_BATCH performance. Optimizations for complex Hermitian and symmetric matrix-matrix multiplication for Cavium ThunderX2.

    • Release Note
    • EULA
    • Documentation
  • Arm Allinea Studio: 18.0 November 09, 2017

    What's new in 18.0

    Arm Compiler for HPC contains the following packages:

    • Arm Compiler v18.0
    • Arm Performance Libraries v18.0
    • GNU GCC 7.1

    New features and enhancements

    =============================

    Arm Compiler 18.0

    • Increased coverage for Fortran 2003 and Fortran 2008. Please see the following page for more details:
       https://developer.arm.com/products/software-development-tools/hpc/arm-fortran-compiler
    • Runtime performance and stability improvements.
    • Tuning for the host platform is now easily done using '-mcpu=native'.
    • Improved user documentation. The Arm Compiler now includes a man-page and has a more accurate and descriptive '--help' command line option.
    • Added support for vector math routines using -fsimdmath.
    • Implemented more features to improve debugging of Fortran applications.
    •  -ffp-contract=fast is now the default behavior for Fortran workloads.  This allows FP instructions to be fused (eg. into FMA instructions), and makes Arm Compiler consistent with other Fortran compilers (e.g. gfortran). In order to maintain consistency with most C/C++ compilers (e.g. Clang and gcc), C/C++ workloads have a more restrictive default of -ffp-contract=on and only perform this operation in the presence of an FP_CONTRACT pragma.

    Arm Performance Libraries 18.0

    • The Qualcomm Falkor core is added as a new microarchitecture target with specific tunings.
    • New support for the following BLAS extension routines, see the Arm Performance Libraries Reference manual for details:
      • *AXPBY and cblas_*axpby for single and double precision real and complex data.
      • *GEMM_BATCH and cblas_*gemm_batch for single and double precision real and complex data. Examples for SGEMM_BATCH and cblas_zgemm batch are provided.
      • *GEMM3M and cblas_*gemm3m for single and double precision complex data.
        Note that these *GEMM3M and cblas_*gemm3m routines are included in the API, but currently offer no performance advantages over the regular *GEMM and cblas*gemm  routines.
    • Support for LAPACK version 3.7.1.
    • A change has been made to C prototypes for Fortran BLAS routines in armpl.h. Where strings are passed as arguments it is no longer a requirement in the interface to pass string lengths after the standard options to the BLAS routines. Note we recommend that users include these string lengths in their calls from C directly to the Fortran interface.
    • Various performance improvements.
    • Release Note
    • EULA
    • Documentation