The High Performance Conjugate Gradient HPCG benchmark is designed to compliment the High Performance LINPACK HPL benchmark in order to better characterize HPC systems. While HPL focuses on floating point operations as a metric, HPCG stresses the memory access patterns. HPCG focuses on sparse matrix-vector multiplication to solve a linear system of equations using the conjugate gradient method. This method is widely used in the scientific computing community, particularly in the area of computational fluid dynamics.
URL: http://www.hpcg-benchmark.org
Source URL: https://gitlab.com/arm-hpc/benchmarks/hpcg
Mirrored from: https://github.com/hpcg-benchmark/hpcg.git
Categories: open-source, benchmark
HPCG 3.1
Build Procedure
Clone the repository
The HPCG repository can be cloned from the Arm GitLab mirror:
git clone https://gitlab.com/arm-hpc/benchmarks/hpcg HPCG
cd HPCG
Configure
The setup
directory contains a number of config files. It is recommended that you copy an existing setup file and then modify the Makefile
rules and flags (e.g. MPI, OpenMP, and any compiler flags).
The files Make.GCC_OMP
, Make.Linux_MPI
, Make_Linux_serial
, and Make.MPI_GCC_OMP
should work for systems with default library installations. They may not have optimized compiler flags, however. To use a different compiler such as the Arm Compiler for Linux for example, create setup/Make.MPI_Arm_OMP
based on setup/Make.MPI_GCC_OMP
:
cp setup/Make.MPI_GCC_OMP setup/Make.MPI_Arm_OMP
And set the CXXFLAGS
to:
CXXFLAGS = $(HPCG_DEFS) -O3 -mcpu=native -ffp-contract=fast -fsimdmath -fopenmp
Make a directory for the build and configure:
mkdir build
cd build
../configure MPI_Arm_OMP
Build
Run make in the build directory:
make -j
Test
By default HPCG will use the settings specified in bin/hpcg.dat
. This sets the matrix size to 104^3. For testing purposes a smaller system size can be specified at the command line. It is important to set the number of OpenMP threads through the environemnt variable OMP_NUM_THREADS
. For example, on 16 cores with one OpenMP thread per MPI task :
OMP_NUM_THREADS=1 mpirun -np 16 ./bin/xhpcg 32 24 16
Optimization
By using different build directories, it is easy to track how various optimizations affect the performance of HPCG. The figure of merit is GFLOP/s
. This is reported in a unique final after each run labeled HPCG-Benchmark_{Version}_{Date}_{Time}.txt
Compiler Flags
The first step in optimizing HPCG is through compiler flags in the configuration file. For example, using GCC 11.2 in serial mode, HPCG converges correctly using:
CXXFLAGS = $(HPCG_DEFS) -fomit-frame-pointer -Ofast -mcpu=native -funroll-loops
Modifying source code
HPCG encourages modifying the source code. For each numerical routine, there are two files. One is a reference version which should not be changed. The other can be modified as much as desired to achieve better performance. For example, the dot product routine is calculated in ComputeDotProduct.cpp
and ComputeDotProduct_ref.cpp
. If no optimizations are made, the ComputeDotProduct.cpp
will simply call the reference routine:
int ComputeDotProduct(const local_int_t n, const Vector & x, const Vector & y,
double & result, double & time_allreduce, bool & isOptimized) {
// This line and the next two lines should be removed and your version of ComputeDotProduct should be used.
isOptimized = false;
return ComputeDotProduct_ref(n, x, y, result, time_allreduce);
Daniel Ruiz from Arm wrote two blogs describing how HPCG can be optimized for Arm HPCs located here and here.
The source code from all of Daniel's modifications can be cloned here based on HPCG version 3.0.
Adding Performance Libraries
As HPCG uses a matrix based problem, there are ample opportunities to replace naive operations with optimized library calls. The HPCG_for_Arm repository described above has options for using the Arm Performance Libraries for most basic linear algebra operations.