You copied the Doc URL to your clipboard.

E MPI distribution notes and known issues

This appendix has brief notes on many of the MPI distributions supported by Arm DDT and Arm MAP.

Advice on settings and problems particular to a distribution are given here. Note that MAP supports fewer MPI distributions than DDT. See C Supported platforms for more details.

E.1 Berkeley UPC

Only the MPI transport is supported. Programs must be compiled with the -tv flag, for example:


  upcc hello.c -o hello -g -tv

E.2 Bull MPI

Bull MPI 1, MPI 2 and Bull X-MPI are supported. For Bull X-MPI select the Open MPI or Open MPI (Compatibility) MPIs, depending on whether ssh is allowed. If ssh is allowed choose Open MPI, or if not choose Open MPI Compatibility mode.

Select Bull MPI or Bull MPI 1 for Bull MPI 1, or Bull MPI 2 for Bull MPI 2 from the MPI implementations list. In the mpirun arguments box of the Run window you may also wish to specify the partition that you wish to use by adding the following:


   -p partition_name

You should ensure that prun, the command used to launch jobs, is in your PATH before starting DDT.

E.3 Cray MPT

This section only applies when using aprun. For srun ('Native' SLURM mode) see section E.16 SLURM.

DDT and MAP have been tested with Cray XT 5/6, XE6, XK6/7, and XC30 systems. DDT is able to launch and support debugging jobs in excess of 700,000 cores.

A number of template files for launching applications from within the queue, using Arm's job submission interface, are included in the distribution. These may require some minor editing to cope with local differences on your batch system.

To attach to a running job on a Cray system the MOM nodes, that is those nodes where aprun is launched, must be reachable via ssh from the node where DDT is running, for example on a login node. DDT must connect to these nodes in order to launch debugging daemons on the compute nodes. Users can either specify the aprun host manually in the attach dialog when scanning for jobs, or configure a hosts list containing all MOM nodes.

Preloading of the memory debugging libraries is not supported with aprun.

If the program is dynamically linked, MAP supports preloading of the sampling libraries with aprun (requires aprun/ALPS 4.1 or later), else MAP requires Arm's sampling libraries to be linked with the application before running on this platform. Preloading is not supported in MPMD mode. See 16.2.2 Linking for a step-by-step guide.

E.3.1 Using DDT with Cray ATP (the Abnormal Termination Process)

DDT is compatible with the Cray ATP system, which will be default on some XE systems. This runtime addition to applications automatically gathers crashing process stacks, and can be used to let DDT attach to a job before it is cleaned up during a crash.

To debug after a crash when an application is run with ATP but without a debugger, initialize the ATP_HOLD_TIME environment variable before launching the job. For a large Petascale system, a value of 5 is sufficient, giving 5 minutes for the attach to complete.

The following example shows the typical output of an ATP session:


  n10888@kaibab:∼> aprun -n 1200 ./atploop 
Application 1110443 is crashing. ATP analysis proceeding...
Stack walkback for Rank 23 starting:
_start@start.S:113
__libc_start_main@libc-start.c:220
main@atploop.c:48
__kill@0x4b5be7
Stack walkback for Rank 23 done
Process died with signal 11: 'Segmentation␣fault'
View application merged backtrace tree file 'atpMergedBT.dot'
with 'statview'
You may need to 'module␣load␣stat'.

atpFrontend: Waiting 5 minutes for debugger to attach...

To debug the application at this point, launch DDT.

DDT can attach using the Attaching dialogs described in Section 5.9 Attaching to running programs, or given the PID of the aprun process, the debugging set can be specified from the command line.

For example, to attach to the entire job:


   ddt --attach-mpi=12772

If a particular subset of processes are required, then the subset notation could also be used to select particular ranks.


   ddt --attach-mpi=12772 --subset=23,100-112,782,1199

E.4 HP MPI

Select HP MPI as the MPI implementation.

A number of HP MPI users have reported a preference to using mpirun -f jobconfigfile instead of mpirun -np 10 a.out for their particular system. It is possible to configure DDT to support this configuration using the support for batch (queuing) systems.

The role of the queue template file is analogous to the -f jobconfigfile.

If your job config file normally contains:


   -h node01 -np 2 a.out 
-h node02 -np 2 a.out

Then your template file should contain:


   -h node01 -np PROCS_PER_NODE_TAG /usr/local/ddt/bin/ddt-debugger 
-h node02 -np PROCS_PER_NODE_TAG /usr/local/ddt/bin/ddt-debugger

Also the Submit Command box should be filled with the following:


   mpirun -f

Select the Template uses NUM_NODES_TAG and PROCS_PER_NODE_TAG radio button. After this has been configured by clicking OK, you will be able to start jobs. Note that the Run button is replaced with Submit, and that the number of processes box is replaced by Number of Nodes.

E.5 IBM PE

Ensure that poe is in your path, and select IBM PE as the MPI implementation.

A sample Loadleveler script, which starts debugging jobs on POE systems, is included in the {installation-directory}/templates directory.

To attach to already running POE jobs, SSH access to the compute nodes is required. Without SSH, DDT has no way to connect to the ranks running on the nodes.

Known issue: IBM PE 2.1 and newer currently do not provide the debugging interface required for MPI message queue debugging.

E.6 Intel MPI

Select Intel MPI from the MPI implementation list. DDT and MAP have been tested with Intel MPI 4.1.x, 5.0.x and later.

Make sure to pay attention to the changes in the mpivars.sh script with Intel MPI 5.0. You can pass it an argument to say whether you want to use the debug or release version of the MPI libraries. The default, if you omit the argument, is the release version, but message queue debugging will not work if you use this version. The debug version must be explicitly used.

DDT also supports the Intel Message Checker tool that is included in the Intel Trace Analyser and Collector software. A plugin for the Intel Trace Analyser and Collector version 7.1 is provided in DDT's plugins directory. Once you have installed the Intel Trace Analyser and Collector, you should make sure that the following directories are in your LD_LIBRARY_PATH:


   {path to intel install directory}/itac/7.1/lib 
{path to intel install directory}/itac/7.1/slib

The Intel Message Checker only works if you are using the Intel MPI. Make sure Intel's mpiexec is in your path, and that your application was compiled against Intel's MPI, then launch DDT, check the plugin checkbox and debug your application as usual. If one of the above steps has been missed out, DDT may report an error and say that the plugin could not be loaded.

Once you are debugging with the plugin loaded, DDT will automatically pause the application whenever Intel Message Checker detects an error. The Intel Message Checker log can be seen in the standard error (stderr) window.

Note that the Intel Message Checker will abort the job after 1 error by default. You can modify this by adding -genv VT_CHECK_MAX_ERRORS0 to the mpiun arguments box in the Run window. See Intel's documentation for more details on this and other environment variable modifiers.

Attach dialog: DDT cannot automatically discover existing running MPI jobs that use Intel MPI if the processes are started using the mpiexec command (which uses the MPD process starting daemon). To attach to an existing job you will need to list all potential compute nodes individually in the dialog.

Please note the mpiexec method of starting MPI processes is deprecated by Intel and you are encouraged to use mpirun or mpiexec.hydra (which use the newer scalable Hydra process starting daemon). All processes that are started by either mpirun and mpiexec.hydra are discovered automatically by Arm DDT.

If you use Spectrum LSF as workload manager in combination with Intel MPI and you get for example one of the following errors:

  • ¡target program¿ exited before it finished starting up. One or more processes were killed or died without warning
  • ¡target program¿ encountered an error before it initialised the MPI environment. Thread 0 terminated with signal SIGKILL

or the job is killed otherwise during launching/attaching then you may need to set/export I_MPI_LSF_USE_COLLECTIVE_LAUNCH=1 before executing the job. See Using IntelMPI under LSF quick guide and Resolve the problem of the Intel MPI job …hang in the cluster for more details.

E.7 MPC

DDT supports MPC version 2.5.0 and upwards. MPC is not supported by MAP.

In order to debug an MPC program, a script needs adding to the MPC installation. This script is obtained from Download MPC script and should be saved into the bin/mpcrun_opt subdirectory of your MPC framework installation.

E.7.1 MPC in the Run window

When the MPC framework is selected as the MPI implementation, there is an additional field in the MPI configuration within the Run window:

Number of MPC Tasks: The number of tasks that you wish to debug. MPC uses threads to split these tasks over the number of processes specified.

Also, the mpirun arguments field is replaced with the field:

mpcrun arguments: (optional): The arguments that are passed to mpcrun. This should be used for arguments to mpcrun not covered by the number of MPC tasks and number of processes fields.

An example usage is to override default threading model specified in the MPC configuration by entering --multithreading=pthreads for POSIX threads or --multithreading=ethreads for user-level threads.

The documentation for these arguments can be found at http://mpc.paratools.com/UsersGuide/Running. This field is only displayed if the selected MPI implementation is the MPC framework.

Note

The OpenMP options are not available in the Run window, as MPC uses the number of tasks to determine the number of OpenMP threads rather than OMP_NUM_THREADS.

E.7.2 MPC on the command line

There are two additional command-line arguments to DDT when using MPC that can be used as an alternative to configuration in the GUI.

--mpc-task-nb The total number of MPC tasks to be created.

--mpc-process-nb The total number of processes to be started by mpcrun.

E.8 MPICH 1 p4

Choose MPICH 1 Standard as the MPI implementation.

E.9 MPICH 1 p4 mpd

This daemon based distribution passes a limited set of arguments and environments to the job programs. If the daemons do not start with the correct environment for DDT to start, then the environment passed to the ddt-debugger backend daemons will be insufficient to start.

It should be possible to avoid these problems if .bashrc or .tcshrc/.cshrc are correct.

However, if unable to resolve these problems, you can pass HOME and LD_LIBRARY_PATH, plus any other environment variables that you need.

This is achieved by adding -MPDENV -HOME={homedir} LD_LIBRARY_PATH= {ld-library-path} to the Arguments area of the Run window.

Alternatively from the command-line you may simply write:


   ddt {program-name} -MPDENV- HOME=$HOME LD_LIBRARY_PATH=$LD_LIBRARY_PATH

Your shell will then fill in these values for you.

Choose MPICH 1 Standard as the MPI implementation.

E.10 MPICH 2

If you see the error undefined reference to MPI_Status_c2f while building the MAP libraries then you need to rebuild MPICH 2 with Fortran support. See 16.2.2 Linking for more information on linking.

E.11 MPICH 3

MPICH 3.0.3 and 3.0.4 do not work with Arm Forge due to an MPICH. MPICH 3.1 addresses this and is supported.

There are two MPICH 3 modes: Standard and Compatibility. If the standard mode does not work on your system select MPICH 3 (Compatibility) as the MPI Implementation on the System Settings page of the Options window.

E.12 MVAPICH 2

Known issue: If memory debugging is enabled in DDT, this will interfere with the on-demand connection system used by MVAPICH2 above a threshold process count and applications will fail to start. This threshold default value is 64. To work around this issue, set the environment variable MV2_ON_DEMAND_THRESHOLD to the maximum job size you expect on your system and then DDT will work with memory debugging enabled for all jobs. This setting should not be a system wide default as it may increase startup times for jobs and memory consumption.

MVAPICH 2 now offers mpirun_rsh instead of mpirun as a scalable launcher binary. To use this with DDT, from File → Options (Arm Forge → Preferences on Mac OS X) go to the System page, check Override default mpirun path and enter mpirun_rsh. You should also add -hostfile <hosts>, where <hosts> is the name of your hosts file, within the mpirun_rsh arguments field in the Run window.

To enable message Queue Support MVAPICH 2 must be compiled with the flags --enable-debug --enable-sharedlib. These are not set by default.

E.13 Open MPI

Arm Forge has been tested with Open MPI 1.6.x, 1.8.x, 1.10.x and 2.0.x. Select Open MPI from the list of MPI implementations.

Open MPI 2.1.3 works with Arm Forge. Previous versions of Open MPI 2.1.x do not work due to a bug in the Open MPI debug interface.

There are three different Open MPI choices in the list of MPI implementations to choose from in Arm Forge when debugging or profiling for Open MPI.

  • Open MPI - the job is launched with a custom 'launch agent' that, in turn, launches the Arm daemons.
  • Open MPI (Compatibility) - mpirun launches the Arm daemons directly. This startup method does not take advantage of Arm's scalable tree.
  • Open MPI for Cray XT/XE/XK/XC - for Open MPI running on Cray XT/XE/XK/XC systems. This method is fully able to use Arm's scalable tree infrastructure.

    To launch with aprun (instead of mpirun) simply type the following on the command line:

    
      ddt --mpi="OpenMPI␣(Cray␣XT/XE/XK)" --mpiexec aprun [arguments] 
    # or
    map --mpi="OpenMPI␣(Cray␣XT/XE/XK)" --mpiexec aprun [arguments]

The following section lists some known issues:

  • Early versions of Open MPI 1.8 do not properly support message queue debugging. This is fixed in Open MPI 1.8.6.
  • If you are using the 1.6.x series of Open MPI configured with the --enable-orterun-prefix-by-default flag then DDT requires patch release 1.6.3 or later due to a defect in earlier versions of the 1.6.x series.
  • The version of Open MPI packaged with Ubuntu has the Open MPI debug libraries stripped. This prevents the Message Queues feature of DDT from working.
  • With Open MPI 1.3.4 and Intel Compiler v11, the default build will optimize away a vital call during the startup protocol which means the default Open MPI start up will not work. If this is your combination, either update your Open MPI, or select Open MPI (Compatibility) instead as the DDT MPI Implementation.
  • On Infiniband systems, Open MPI and CUDA can conflict in a manner that results in failure to start processes, or a failure for processes to be debuggable. To enable CUDA interoperability with Infiniband, set the CUDA environment variable CUDA_NIC_INTEROP to 1.

E.14 Platform MPI

Platform MPI 9.x is supported, but only with the mpirun command. Currently mpiexec is not supported.

E.15 SGI MPT / SGI Altix

For SGI use one of the following configurations:

  • If using SGI MPT 2.10+, select SGI MPT (2.10+, batch) as the MPI implementation.
  • If using SGI MPT 2.08+, select SGI MPT (2.08+, batch) as the MPI implementation.
  • If using an older version of SGI MPT (2.07 or before) select SGI MPT as the MPI implementation.

If you are using SGI MPT with PBS or SLURM and would normally use mpiexec_mpt to launch your program you will need to use the pbs-sgi-mpt.qtf queue template file and select SGI MPT (Batch) as the MPI implementation.

If you are using SGI MPT with SLURM and would normally use mpiexec_mpt to launch your program you will need to use srun --mpi=pmi2 directly.

mpiexec_mpt from versions of SGI MPT prior to 2.10 may prevent MAP from starting when preloading the Arm profiler and MPI wrapper libraries. Arm recommends you explicitly link your programs against these libraries to work around this problem.

Preloading the Arm profiler and MPI wrapper libraries is not supported in express launch mode. Arm recommends you explicitly link your programs against these libraries to work around this problem.

Some SGI systems cannot compile programs on the batch nodes (one reason might be because the gcc package is not installed). If this applies to your system you must explicitly compile the Arm MPI wrapper library using the make-profiler-libraries command and then explicitly link your programs against the Arm profiler and MPI wrapper libraries.

The mpio.h header file shipped with SGI MPT 2.09 and SGI MPT 2.10 contains a mismatch between the declaration of MPI_File_set_view and some other similar functions and their PMPI equivalents, for example PMPI_File_set_view. This prevents MAP from generating the MPI wrapper library. Please contact SGI for a fix.

SGI MPT 2.09 requires the MPI_SUPPORT_DDT environment variable to be set to 1 to avoid startup issues when debugging with DDT, or profiling with MAP.

E.16 SLURM

To start MPI programs using the srun command instead of your MPI's usual mpirun command (or equivalent) select SLURM (MPMD) as the MPI Implementation on the System Settings page of the Options.

While this option will work with most MPIs, it will not work with all. On Cray, 'Hybrid' SLURM mode (that is, SLURM + ALPS) is not supported. Instead, you must start your program with Cray's aprun. See Section E.3 Cray MPT.

SLURM may be used as a job scheduler with DDT and MAP through the use of a queue template file. See templates/slurm.qtf in the Arm Forge installation for an example and section A.2 Integration with queuing systems for more information on how to customize the template.

E.17 Spectrum MPI

Spectrum MPI 10.2 is supported for IBM Power (PPC64le little-endian) with the mpirun and mpiexec commands. Spectrum MPI 10.2 is additionally supported with the jsrun (PMIx mode) command.

When using jsrun 1.1.0 [Apr 16, 2018] or earlier, Arm Forge cannot correctly detect the MPI rank for each of your processes. A workaround for DDT can be found in section 8.17 Assigning MPI ranks.

Was this page helpful? Yes No