You copied the Doc URL to your clipboard.

4 Running with real programs

This section will take you through compiling and running your own programs.

Arm Performance Reports is designed to run on unmodified production executables, so in general no preparation step is necessary. However, there is one important exception: statically linked applications require additional libraries at the linking step.

4.1 Preparing a program for profiling

In most cases you do not need to recompile your program to use it with Performance Reports, although in some cases it may need to be relinked, as explained in section 4.1.1 Linking.

CUDA programs

When compiling CUDA kernels do not generate debug information for device code (the -G or --device-debug flag) as this can significantly impair runtime performance. Use -lineinfo instead, for example:


    nvcc device.cu -c -o device.o -g -lineinfo -O3

Arm®;v8 (AArch64) machines

Unwind information is not always compiled in by default on this platform. For accurate results programs that are not compiled with debug information (-g) should at least be compiled with the -fasynchronous-unwind-tables flag or the -funwind-tables flag, preferably the former.

4.1.1 Linking

To collect data from your program, Performance Reports uses two small profiler libraries, map-sampler and map-sampler-pmpi. These must be linked with your program. On most systems Performance Reports can do this automatically without any action by you. This is done via the system's LD_PRELOAD mechanism, which allows an extra library into your program when starting it.

Note

Although these libraries contain the word 'map' they are used for both Arm Performance Reports and Arm MAP.

This automatic linking when starting your program only works if your program is dynamically-linked. Programs may be dynamically-linked or statically-linked, and for MPI programs this is normally determined by your MPI library. Most MPI libraries are configured with --enable-dynamic by default, and mpicc/mpif90 produce dynamically-linked executables that Performance Reports can automatically collect data from.

The map-sampler-pmpi library is a temporary file that is precompiled and copied or compiled at runtime in the directory ~/.allinea/wrapper.

If your home directory will not be accessible by all nodes in your cluster you can change where the map-sampler-pmpi library will be created by altering the shared directory as described in G.1.3 No shared home directory.

The temporary library will be created in the .allinea/wrapper subdirectory to this shared directory.

For Cray X-Series Systems the shared directory is not applicable, instead map-sampler-pmpi is copied into a hidden .allinea sub-directory of the current working directory.

If Performance Reports warns you that it could not pre-load the sampler libraries, this often means that your MPI library was not configured with --enable-dynamic, or that the LD_PRELOAD mechanism is not supported on your platform. You now have three options:

  1. Try compiling and linking your code dynamically. On most platforms this allows Performance Reports to use the LD_PRELOAD mechanism to automatically insert its libraries into your application at runtime.
  2. Link MAP's map-sampler and map-sampler-pmpi libraries with your program at link time manually.

    See 4.1.2 Dynamic linking on Cray X-Series systems, or 4.1.3 Static linking and 4.1.4 Static linking on Cray X-Series systems.

  3. Finally, it may be that your system supports dynamic linking but you have a statically-linked MPI. You can try to recompile the MPI implementation with --enable-dynamic, or find a dynamically-linked version on your system and recompile your program using that version. This will produce a dynamically-linked program that Performance Reports can automatically collect data from.

4.1.2 Dynamic linking on Cray X-Series systems

If the LD_PRELOAD mechanism is not supported on your Cray X-Series system, you can try to dynamically link your program explicitly with the Performance Reports sampling libraries.

Compiling the Arm MPI Wrapper Library

First you must compile the Arm MPI wrapper library for your system using the make-profiler-libraries --platform=cray --lib-type=shared command.

Note

Performance Reports also uses this library.


    user@login:~/myprogram$ make-profiler-libraries --platform=cray --lib-type=shared 

Created the libraries in /home/user/myprogram:
libmap-sampler.so (and .so.1, .so.1.0, .so.1.0.0)
libmap-sampler-pmpi.so (and .so.1, .so.1.0, .so.1.0.0)

To instrument a program, add these compiler options:
compilation for use with MAP - not required for Performance Reports:
-g (or '-G2' for native Cray Fortran) (and -O3 etc.)
linking (both MAP and Performance Reports):
-dynamic -L/home/user/myprogram -lmap-sampler-pmpi -lmap-sampler -Wl,--eh-frame-hdr

Note: These libraries must be on the same NFS/Lustre/GPFS filesystem as your
program.

Before running your program (interactively or from a queue), set
LD_LIBRARY_PATH:
export LD_LIBRARY_PATH=/home/user/myprogram:$LD_LIBRARY_PATH
map ...
or add -Wl,-rpath=/home/user/myprogram when linking your program.

Linking with the Arm MPI Wrapper Library

    mpicc -G2 -o hello hello.c -dynamic -L/home/user/myprogram \ 
-lmap-sampler-pmpi -lmap-sampler -Wl,--eh-frame-hdr

PGI Compiler

When linking OpenMP programs you must pass the -Bdynamic command line argument to the compiler when linking dynamically.

When linking C++ programs you must pass the -pgc++libs command line argument to the compiler when linking.

4.1.3 Static linking

If you compile your program statically, that is your MPI uses a static library or you pass the -static option to the compiler, then you must explicitly link your program with the Arm sampler and MPI wrapper libraries.

Compiling the Arm MPI Wrapper Library

First you must compile the Arm MPI wrapper library for your system using the make-profiler-libraries --lib-type=static command.

Note

Performance Reports also uses this library.


   user@login:~/myprogram$ make-profiler-libraries --lib-type=static 

Created the libraries in /home/user/myprogram:
libmap-sampler.a
libmap-sampler-pmpi.a

To instrument a program, add these compiler options:
compilation for use with MAP - not required for Performance Reports:
-g (and -O3 etc.)
linking (both MAP and Performance Reports):
-Wl,@/home/user/myprogram/allinea-profiler.ld ... EXISTING_MPI_LIBRARIES
If your link line specifies EXISTING_MPI_LIBRARIES (e.g. -lmpi), then
these must appear *after* the Arm sampler and MPI wrapper libraries in
the link line. There's a comprehensive description of the link ordering
requirements in the 'Preparing a Program for Profiling' section of either
userguide-forge.pdf or userguide-reports.pdf, located in
/opt/arm/forge/doc/.

Linking with the Arm MPI Wrapper Library

The -Wl,@/home/user/myprogram/allinea-profiler.ld syntax tells the compiler to look in /home/user/myprogram/allinea-profiler.ld for instructions on how to link with the Arm sampler. Usually this is sufficient, but not in all cases. The rest of this section explains how to manually add the Arm sampler to your link line.

PGI Compiler

When linking C++ programs you must pass the -pgc++libs command line argument to the compiler when linking.

The PGI compiler must be 14.9 or later. Using earlier versions of the PGI compiler will fail with an error such as "Error: symbol 'MPI_F_MPI_IN_PLACE' can not be both weak and common" due to a bug in the PGI compiler's weak object support.

If you do not have access to PGI compiler 14.9 or later try compiling and the linking Arm MPI wrapper as a shared library as described in 4.1.2 Dynamic linking on Cray X-Series systems Ommit the option --platform=cray if you are not on a Cray.

Cray

When linking C++ programs you may encounter a conflict between the Cray C++ runtime and the GNU C++ runtime used by the Performance Reports libraries with an error similar to the one below:


   /opt/cray/cce/8.2.5/CC/x86-64/lib/x86-64/libcray-c++-rts.a(rtti.o): In function '__cxa_bad_typeid': 
/ptmp/ulib/buildslaves/cfe-82-edition-build/tbs/cfe/lib_src/rtti.c:1062: multiple definition of '__cxa_bad_typeid'
/opt/gcc/4.4.4/snos/lib64/libstdc++.a(eh_aux_runtime.o):/tmp/peint/gcc/repackage/4.4.4c/BUILD/snos_objdir/x86_64-suse-linux/libstdc++-v3/libsupc++/../../../../xt-gcc-4.4.4/libstdc++-v3/libsupc++/eh_aux_runtime.cc:46: first defined here

You can resolve this conflict by removing -lstdc++ and -lgcc_eh from allinea-profiler.ld.

-lpthread

When linking -Wl,@allinea-profiler.ld must go before the -lpthread command line argument if present.

Manual Linking

When linking your program you must add the path to the profiler libraries (-L/path/to/profiler-libraries), and the libraries themselves (-lmap-sampler-pmpi, -lmap-sampler).

The MPI wrapper library (-lmap-sampler-pmpi) must go:

  1. After your program's object (.o) files.
  2. After your program's own static libraries, for example -lmylibrary.
  3. After the path to the profiler libraries (-L/path/to/profiler-libraries).
  4. Before the MPI's Fortran wrapper library, if any. For example -lmpichf.
  5. Before the MPI's implementation library usually -lmpi.
  6. Before the Arm sampler library -lmap-sampler.

The sampler library -lmap-sampler must go:

  1. After the Arm MPI wrapper library.
  2. After your program's object (.o) files.
  3. After your program's own static libraries, for example -lmylibrary.
  4. After -Wl,--undefined,allinea_init_sampler_now.
  5. After the path to the profiler libraries -L/path/to/profiler-libraries.
  6. Before -lstdc++, -lgcc_eh, -lrt, -lpthread, -ldl, -lm and -lc.

For example:


   mpicc hello.c -o hello -g -L/users/ddt/allinea \ 
-lmap-sampler-pmpi \
-Wl,--undefined,allinea_init_sampler_now \
-lmap-sampler -lstdc++ -lgcc_eh -lrt \
-Wl,--whole-archive -lpthread \
-Wl,--no-whole-archive \
-Wl,--eh-frame-hdr \
-ldl \
-lm

mpif90 hello.f90 -o hello -g -L/users/ddt/allinea \
-lmap-sampler-pmpi \
-Wl,--undefined,allinea_init_sampler_now \
-lmap-sampler -lstdc++ -lgcc_eh -lrt \
-Wl,--whole-archive -lpthread \
-Wl,--no-whole-archive \
-Wl,--eh-frame-hdr \
-ldl \
-lm

4.1.4 Static linking on Cray X-Series systems

Compiling the MPI Wrapper Library

On Cray X-Series systems use make-profiler-libraries --platform=cray --lib-type=static instead:


   Created the libraries in /home/user/myprogram: 
libmap-sampler.a
libmap-sampler-pmpi.a

To instrument a program, add these compiler options:
compilation for use with MAP - not required for Performance Reports:
-g (or -G2 for native Cray Fortran) (and -O3 etc.)
linking (both MAP and Performance Reports):
-Wl,@/home/user/myprogram/allinea-profiler.ld ... EXISTING_MPI_LIBRARIES
If your link line specifies EXISTING_MPI_LIBRARIES (e.g. -lmpi), then
these must appear *after* the Arm sampler and MPI wrapper libraries in
the link line. There's a comprehensive description of the link ordering
requirements in the 'Preparing a Program for Profiling' section of either
userguide-forge.pdf or userguide-reports.pdf, located in
/opt/arm/forge/doc/.

Linking with the MPI Wrapper Library


   cc hello.c -o hello -g -Wl,@allinea-profiler.ld 

ftn hello.f90 -o hello -g -Wl,@allinea-profiler.ld

4.1.5 Dynamic and static linking on Cray X-Series systems using the modules environment

If your system has the Arm module files installed, you can load them and build your application as usual. See section 4.1.6 .

  1. module load reports or ensure that make-profiler-libraries is in your PATH.
  2. module load map-link-static or module load map-link-dynamic.
  3. Recompile your program.

4.1.6 map-link modules installation on Cray X-Series

To facilitate dynamic and static linking of user programs with the Arm MPI Wrapper and Sampler libraries Cray X-Series System Administrators can integrate the map-link-dynamic and map-link-static modules into their module system. Templates for these modules are supplied as part of the Arm Performance Reports package.

Copy files share/modules/cray/map-link-* into a dedicated directory on the system.

For each of the two module files copied:

  1. Find the line starting with conflict and correct the prefix to refer to the location the module files were installed, for example, arm/map-link-static. The correct prefix depends on the subdirectory (if any) under the module search path the map-link-* modules were installed.
  2. Find the line starting with set MAP_LIBRARIES_DIRECTORY "NONE" and replace "NONE" with a user writable directory accessible from the login and compute nodes.

After installed you can verify whether or not the prefix has been set correctly with 'module avail', the prefix shown by this command for the map-link-* modules should match the prefix set in the 'conflict' line of the module sources.

4.2 Express Launch mode

Arm Performance Reports can be launched by typing its command name in front of an existing mpiexec command:


   $ perf-report mpiexec -n 256 examples/wave_c 30

This startup method is called Express Launch and is the simplest way to get started. If your MPI is not yet supported in this mode, you will see a error message like this:


   $ 'MPICH 1 standard' programs cannot be started using Express Launch syntax (launching with an mpirun command). 

Try this instead:
perf-report --np=256 ./wave_c 20

Type perf-report --help for more information.

This is referred to as Compatibility Mode, in which the mpiexec command is not included and the arguments to mpiexec are passed via a --mpiargs="args here" parameter.

One advantage of Express Launch mode is that it is easy to modify existing queue submission scripts to run your program under one of the Arm Performance Reports products.

Normal redirection syntax may be used to redirect standard input and standard output.

4.2.1 Compatible MPIs

The following lists the MPI implementations supported by Express Launch:

  • BlueGene/Q
  • Bullx MPI
  • Cray X-Series (MPI/SHMEM/CAF)
  • Intel MPI
  • MPICH 2
  • MPICH 3
  • Open MPI (MPI/SHMEM)
  • Oracle MPT
  • Open MPI (Cray XT/XE/XK)
  • Cray XT/XE/XK (UPC)

4.3 Compatibility Launch mode

Compatibility Mode must be used if Arm Performance Reports does not support Express Launch mode for your MPI, or, for some MPIs, if it is not able to access the compute nodes directly (for example, using ssh).

To use Compatibility Mode replace the mpiexec command with the perf-report command. For example:


   mpiexec --np=256 ./wave_c 20

This would become:


   perf-report --np=256 ./wave_c 20

Only a small number of mpiexec arguments are supported by perf-report (for example, -n and -np). Other arguments must be passed using the --mpiargs="args here" parameter.

For example:


   mpiexec --np=256 --nooversubscribe ./wave_c 20

Becomes:

   perf-report --mpiargs="--nooversubscribe" --np=256 ./wave_c 20

Normal redirection syntax may be used to redirect standard input and standard output.

4.4 Generating a performance report

Make sure the Arm Performance Reports module for your system has been loaded:


   $ perf-report --version 
Arm Performance Reports
Copyright (c) 2002-2017 Arm Limited (or its affiliates). All rights reserved.
...

If this command cannot be found consult the site documentation to find the name of the correct module.

Once the module is loaded, you can simply add the perf-report command in front of your existing mpiexec command-line:


   perf-report mpiexec -n 4 examples/wave_c
                                                                                       
                                                                                       

If your program is submitted through a batch queuing system, then modify your submission script to load the Arm module and add the 'perf-report' line in front of the mpiexec command you want to generate a report for.

The program runs as usual, although startup and shutdown may take a few minutes longer while Arm Performance Reports generates and links the appropriate wrapper libraries before running and collects the data at the end of the run. The runtime of your code (between MPI_Init and MPI_Finalize should not be affected by more than a few percent at most.

After the run finishes, a performance report is saved to the current working directory, using a name based on the application executable:


   $ ls -lrt wave_c* 
-rwx------ 1 mark mark 403037 Nov 14 03:21 wave_c
-rw------- 1 mark mark 1911 Nov 14 03:28 wave_c_4p_2013-11-14_03-27.txt
-rw------- 1 mark mark 174308 Nov 14 03:28 wave_c_4p_2013-11-14_03-27.html

Note that both .txt and .html versions are automatically generated.

You can include a short description of the run or other notes on configuration and compilation settings by setting the environment variable ALLINEA_NOTES before running perf-report:


   $ ALLINEA_NOTES="Run with inp421.dat and mc=1" perf-report mpiexec -n 512 ./parEval.bin --use-mc=1 inp421.dat

The string in the ALLINEA_NOTES environment variable is included in all report files produced.

4.5 Specifying output locations

By default, performance reports are placed in the current working directory using an auto-generated name based on the application executable name, for example:


    wave_f_16p_2013-11-18_23-30.html 
wave_f_2p_8t_2013-11-18_23-30.html

This is formed by the name, the size of the job, the date, and the time. If using OpenMP, the value of OMP_NUM_THREADS is also included in the name after the size of the job. The name will be made unique if necessary by adding a _1/_2/…suffix.

You can specify a different location for output files using the --output argument:

  • --output=my-report.txt will create a plain text report in the file my-report.txt in the current directory.
  • --output=/home/mark/public/my-report.html will create an HTML report in the file /home/mark/public/my-report.html.
  • --output=my-report will create a plain text report in the file my-report.txt and an HTML report in the file my-report.html, both in the current directory.
  • --output=/tmp will create reports with names based on the application executable name in /tmp/, for example, /tmp/wave\_f\_16p\_2013-11-18\_23\-30.txt and /tmp/wave\ _f\_16p\_2013-11-18\_23\-30.html.

4.6 Support for DCIM systems

Performance Reports includes support for Data Center Infrastructure Management (DCIM) systems.

You can output all the metrics generated by the Performance Reports to a script using the --dcim-output argument. By default, the pr-dcim script is called and the collected metrics are sent to Ganglia (a System Monitoring tool).

The pr-dcim script looks for a gmetric implementation as part of the Ganglia software, and call it as many times as there are metrics.

4.6.1 Customizing your DCIM script

The default pr-dcim script is located in installation-directory/performance-reports/ganglia-connector/pr-dcim.

However, you can use your own custom script by specifying the ALLINEA_DCIM_SCRIPT environment variable.

This option is recommended if you are using a System Monitoring tool other than Ganglia.

Such a script is expecting arguments as follows, each argument can be specified once per metric:

  • -V{METRIC}={VALUE} (mandatory) specifies that the metric METRIC has the value VALUE.
  • -U{METRIC}={UNITS} (optional) specifies that the metric METRIC is expressed in UNITS.
  • -T{METRIC}={TITLE} (optional) specifies that the metric METRIC has title TITLE.
  • -t{METRIC}={TYPE} (optional) specifies that the metric METRIC has TYPE data type.

4.6.2 Customising the gmetric location

You can specify the path to your gmetric implementation by using the ALLINEA_GMETRIC environment variable.

Your gmetric version must accept the following command line arguments:

  • -n {NAME} (mandatory) specifies the name of the metric (starts with com.allinea).
  • -t {TYPE} (mandatory) specifies the type of the metric (for example, double or int32).
  • -v {VALUE} (mandatory) specifies the value of the metric.
  • -g {GROUP} (optional) specifies which groups the metric belongs to (for example allinea).
  • -u {UNIT} (optional) specifies the unit of the metric. For example, %, Watts, Seconds, and so on.
  • -T {TITLE} (optional) specifies the title of the metric.

4.7 Enable and disable metrics

--enable-metrics=METRICS
--disable-metrics=METRICS

Allows you to specify comma-separated lists which explicitly enable or disable metrics for which data is to be collected. If the metrics specified cannot be found, an error message is displayed and Performance Reports exits. Metrics which are always enabled or disabled cannot be explicitly disabled or enabled. A metrics source library which has all its metrics disabled, either in the XML definition or via --disable-metrics, will not be loaded. Metrics which can be explicitly enabled or disabled can be listed using the --list-metrics option.

Was this page helpful? Yes No