HPCG

Building HPCG with Arm Compiler


Overview Before you begin Procedure Related information

Overview

How to build HPCG with Arm Compiler for HPC.

HPCG complements the High-Performance LINPACK (HPL) benchmark. It applies computational and data access patterns that emulate the patterns of real HPC applications.

The version of the code that is used here is optimized to enable parallelism with OpenMP during the ComputeSYMGS stage using two different techniques:

  • Multi-level task dependency graph with data structure reordering.
  • Block multicoloring data structure reordering, and block interleaving to enable vectorization.

The first technique is used on the finest level of the HPCG grid, and the second is used for the coarser levels. 

In addition, the ComputeSPMV, ComputeDotProduct, and ComputeWAXPBY kernels now benefit by calling Arm Performance Libraries (define HPCG_USE_{SPMV,DDOT,WAXPBY}_ARMPL at compilation time):

Minor optimizations have also been applied to these kernels, such as loop unrolling, in the absence of Arm Performance Libraries.

The following components are used in this build:

 Component Form
 HPCG (including optimizations) https://gitlab.com/arm-hpc/benchmarks/hpcg (use tag: SC18)
 Arm Compiler for HPC
 Version 19.1
 Open MPI
 3.1.2 
 Linux distribution
 RHEL 7.5
 Hardware
 Cavium ThunderX2

Recipes for other versions of the application are available in the GitLab Packages Wiki.

Procedure

  1. Clone the Arm HPCG repository from GitLab, and change into the unpacked hpcg directory:

    git clone -b SC18 https://gitlab.com/arm-hpc/benchmarks/hpcg
    cd hpcg

    Note:

    The Arm repository is a mirror of the HPCG source.

    The setup file for this recipe is:

    setup/Make.ARM_TDG_BCOL

    This setup file uses flags for Arm Compiler for HPC via the mpic++ wrapper script, which is available in your MPI installation.

  2. Make a directory for the build, change into it, and configure it using the ARM_TDG_BCOL setup:

    mkdir build
    cd build
    ../configure ARM_TDG_BCOL
  3. To build, run make in the build directory:

    make -j
  4. Run a test. 

    A constraint for this version of HPCG is that each of the three dimensions of the problem must be a power of 2. This constraint causes the benchmark to fail for the default input file that is given in bin/hpcg.dat. Therefore, Arm recommends that you set the dimensions as required on the command line. The following example shows a small test case using four MPI processes:

    mpirun -np 4 ./bin/xhpcg 64 32 16

    For larger cases, enable OpenMP and fix affinity settings appropriately. For example, use 56 cores with 8 MPI tasks, and 7 OpenMP threads per task which are each dedicated to a single core:

    export OMP_NUM_THREADS=7
    export OMP_PROC_BIND=close
    export OMP_PLACES=cores
    mpirun -np 8 --map-by socket:PE=7 ./xhpcg 128 128 128