When compiling the program that you wish to debug, you must add the debug flag to your compile command. For most compilers this is -g.
It is also advisable to turn off compiler optimizations as these can make debugging appear strange and unpredictable. If your program is already compiled without debug information you will need to make the files that you are interested in again.
The Welcome Page allows you to choose what kind of debugging you want to do, for example you can:
- Run a program from DDT and debug it.
- Debug a program you launch manually (for example, on the command line).
- Attach to an already running program.
- Open core files generated by a program that crashed.
- Connect to a remote system and accept a Reverse Connect request.
If you click the Run button on the Welcome Page you see the window above. The settings are grouped into sections. Click the Details… button to expand a section. The settings in each section are described below.
Application: The full path name to your application. If you specified one on the command line, this is filled in. You may browse for an application by clicking on the Browse button.
Many MPIs have problems working with directory and program names containing spaces. You are advised to avoid the use of spaces in directory and file names.
Arguments: (optional) The arguments passed to your application. These are automatically filled if you entered some on the command line.
Avoid using quote characters such as ' and ", as these may be interpreted differently by DDT and your command shell. If you must use these and cannot get them to work as expected, please contact Arm support at email@example.com .
stdin file: (optional) This allows you to choose a file to be used as the standard input (stdin) for your program. DDT automatically adds arguments to mpirun to ensure your input file is used.
If you only have a single process license or have selected none as your MPI Implementation the MPI options will be missing. The MPI options are not available when DDT is in single process mode. See section 5.4 Debugging single-process programs for more details about using DDT with a single process.
Number of processes: The number of processes that you wish to debug. DDT supports hundreds of thousands of processes but this is limited by your license.
Number of nodes: This is the number of compute nodes that you wish to use to run your program.
Processes per node: This is the number of MPI processes to run on each compute node.
Implementation: The MPI implementation to use. If you are submitting a job to a queue the queue settings will also be summarized here. You may change the MPI implementation by clicking on the Change… button.
The choice of MPI implementation is critical to correctly starting DDT. Your system will normally use one particular MPI implementation. If you are unsure as to which to pick, try generic, consult your system administrator or Arm support. A list of settings for common implementations is provided in Appendix E MPI distribution notes and known issues.
If your desired MPI command is not in your PATH, or you wish to use an MPI run command that is not your default one, you can configure this using the Options window (See section A.5.1 System).
mpirun arguments: (optional): The arguments that are passed to mpirun or your equivalent, usually prior to your executable name in normal mpirun usage. You can place machine file arguments, if necessary, here. For most users this box can be left empty. You can also specify mpirun arguments on the command line (using the --mpiargs command line argument) or using the ALLINEA_MPIRUN_ARGUMENTS environment variable if this is more convenient.
You should not enter the -np argument as DDT will do this for you.
If your license supports it, you may also debug GPU programs by enabling CUDA support. For more information on debugging CUDA programs, please see section 15 CUDA GPU debugging.
Track GPU Allocations: Tracks CUDA memory allocations made using cudaMalloc, and similar methods. See 12.2 CUDA memory debugging for more information.
Detect invalid accesses (memcheck): Turns on the CUDA-MEMCHECK error detection tool. See 12.2 CUDA memory debugging for more information.
The DDT configuration depends on the UPC compiler used.
DDT can debug applications compiled with GCC UPC 4.8 with TLS disabled. See section F.5 GNU.
To run a UPC program in DDT you have to select the MPI implementation "GCC libupc SMP (no TLS)"
To run a Berkeley UPC program in DDT you have to compile the program using -tv flag and then select the same MPI implementation used in the Berkeley compiler build configuration.
The Berkeley compiler must be build using the MPI transport.
See section F.3 Berkeley UPC compiler.
Python debugging in DDT has the following limited support:
- Debugging Python scripts running under the CPython interpreter (version 2.7 only).
- Decoding the stack to show Python frames, function names and line numbers.
- Displaying Python local and global variables when a Python frame is selected.
- Stopping on breakpoints and exceptions in native libraries that were invoked from Python code.
- Debugging MPI programs written in Python using mpi4py.
The main use case that this feature is intended for, is debugging a mixed C, C++, Fortran and Python program that crashes somewhere in native code. If this native code was invoked from a Python function, then you can examine the Python stack and local variables that led to the crash. Everything else is not supported, for example breakpoints, stepping, evaluating Python variables and the current line window.
You must have the debug symbols for Python available to DDT on your system. One way to do this is to install the Python debug symbols package. You may need to enable additional debug respositories in your package manager.
Python debugging depends on GDB 7.12.1, so if GDB 7.6.2 is the selected debugger this will need to be changed: Go to File → Options → System and set the Debugger field to Automatic (recommended).
To debug Python scripts, start the Python interpreter that will execute the script under DDT. To get line level resolution, rather than function level resolution, you must also insert %allinea_python_debug% before your script when passing arguments to Python. To run the demo in the examples folder, first change into the examples folder, and then run the following steps. The demo requires that mpi4py is installed.
- Press Play/Continue
- Open the 'Stacks' view and select a Python frame to see the Python local variables
The optional Environment Variables section should contain additional environment variables that should be passed to mpirun or its equivalent. These environment variables may also be passed to your program, depending on which MPI implementation your system uses. Most users will not need to use this box.
on some systems it may be necessary to set environment variables for the DDT backend itself. For example: if /tmp is unusable on the compute nodes you may wish to set TMPDIR to a different directory. You can specify such environment variables in /path/to/ddt/lib/environment. Enter one variable per line and separate the variable name and value with =, for example, TMPDIR=/work/user.
The optional Plugins section allows you to enable plugins for various third-party libraries, such as the Intel Message Checker or Marmot. See section 14 Using and writing plugins for more information.
Click Run to start your program, or Submit if working through a queue. See section A.2 Integration with queuing systems. This runs your program through the debug interface you selected and allows your MPI implementation to determine which nodes to start which processes on.
If you have a program compiled with Intel ifort or GNU g77 you may not see your code and highlight line when DDT starts. This is because those compilers create a pseudo MAIN function, above the top level of your code. To fix this you can either open your Source Code window and add a breakpoint in your code, then run to that breakpoint, or you can use the Step into function to step into your code.
When your program starts, DDT attempts to determine the MPI world rank of each process. If this fails, the following error message is displayed:
This means that the number DDT shows for each process may not be the MPI rank of the process. To correct this you can tell DDT to use a variable from your program as the rank for each process.
See section 8.17 Assigning MPI ranks for details.
To end your current debugging session select the End Session menu option from the File menu. This closes all processes and stops any running code. If any processes remain you may have to clean them up manually using the kill command, or a command provided with your MPI implementation.
Each of the Arm Forge products can be launched by typing its name in front of an existing mpiexec command:
This startup method is called Express Launch and is the simplest way to get started. If your MPI is not yet supported in this mode, you will see a error message like this:
$ 'MPICH 1 standard' programs cannot be started using Express Launch syntax (launching with an mpirun command).
Try this instead:
ddt --np=256 ./wave_c 20
Type ddt --help for more information.
This is referred to as Compatibility Mode, in which the mpiexec command is not included and the arguments to mpiexec are passed via a --mpiargs="args here" parameter.
One advantage of Express Launch mode is that it is easy to modify existing queue submission scripts to run your program under one of the Arm Forge products. This works best for Arm DDT with Reverse Connect, ddt --connect, for interactive debugging or in offline mode (ddt --offline). See 3.3 Reverse Connect for more details.
If you can not use Reverse Connect and wish to use interactive debugging from a queue you may need to configure DDT to generate job submission scripts for you. More details on this can be found in 5.10 Starting a job in a queue and A.2 Integration with queuing systems.
The following lists the MPI implementations currently supported by Express Launch:
- bullx MPI
- Cray X-Series (MPI/SHMEM/CAF)
- Intel MPI
- MPICH 2
- MPICH 3
- Open MPI (MPI/SHMEM)
- Oracle MPT
- Open MPI (Cray XT/XE/XK)
- Cray XT/XE/XK (UPC)
In Express Launch mode, the Run dialog has a restricted number of options:
When using SGI MPT, MPICH 1 Standard or the MPMD variants of MPICH 2, MPICH 3 or Intel MPI, DDT will allow mpirun to start all the processes, then attach to them while they're inside MPI_Init.
This method is often faster than the generic method, but requires the remote-exec facility in DDT to be correctly configured if processes are being launched on a remote machine. For more information on remote-exec, please see section A.4 Connecting to remote programs (remote-exec).
If DDT is running in the background (for example, ddt &) then this process may get stuck (some SSH versions cause this behavior when asking for a password). If this happens to you, go to the terminal and use the fg or similar command to make DDT a foreground process, or run DDT again, without using "&".
If DDT cannot find a password-free way to access the cluster nodes then you will not be able to use the specialized startup options. Instead, You can use generic, although startup may be slower for large numbers of processes.
In addition to the listed MPI implementations above, all MPI implementations except for Bluegene/Q and Cray MPT DDT requires password-free access to the compute nodes when explicitly starting by attaching.
Users with single-process licenses will immediately see the Run dialog that is appropriate for single-process applications.
Users with multi-process licenses can uncheck the MPI check box to run a single process program.
Select the application, either by typing the file name in, or selecting using the browser by clicking the browse button. Arguments can be typed into the supplied box.
Click Run to start your program.
If you have a program compiled with Intel ifort or GNU g77 you may not see your code and highlight line when DDTstarts. This is because those compilers create a pseudo MAIN function, above the top level of your code. To fix this you can either open your Source Code window and add a breakpoint in your code and then play to that breakpoint, or you can use the Step Into function to step into your code.
When running an OpenMP program, set the Number of OpenMP threads value to the number of threads you require. DDT will run your program with the OMP_NUM_THREADS environment variable set to the appropriate value.
There are several important points to keep in mind while debugging OpenMP programs:
- Parallel regions created with #pragma omp parallel (C) or !$OMP PARALLEL (Fortran) will usually not be nested in the Parallel Stack View under the function that contained the #pragma. Instead they will appear under a different top-level item. The top-level item is often in the OpenMP runtime code, and the parallel region appears several levels down in the tree.
- Some OpenMP libraries only create the threads when the first parallel region is reached. It is possible you may only see one thread at the start of the program.
- You cannot step into a parallel region. Instead, check the Step threads together box and use the Run to here command to synchronize the threads at a point inside the region. These controls are discussed in more detail in their own sections of this document.
- You cannot step out of a parallel region. Instead, use Run to here to leave it. Most OpenMP libraries work best if you keep the Step threads together box ticked until you have left the parallel region. With the Intel OpenMP library, this means you will see the Stepping Threads window and will have to click Skip All once.
- Leave Step threads together off when you are outside a parallel region, as OpenMP worker threads usually do not follow the same program flow as the main thread.
- To control threads individually, use the Focus on Thread control. This allows you to step and play one thread without affecting the rest. This is helpful when you want to work through a locking situation or to bring a stray thread back to a common point. The Focus controls are discussed in more detail in their own section of this document.
- Shared OpenMP variables may appear twice in the Locals window. This is one of the many unfortunate side-effects of the complex way OpenMP libraries interfere with your code to produce parallelism. One copy of the variable may have a nonsense value, this is usually easy to recognize. The correct values are shown in the Evaluate and Current Line windows.
- Parallel regions may be displayed as a new function in the stack views. Many OpenMP libraries implement parallel regions as automatically-generated "outline" functions, and DDT shows you this. To view the value of variables that are not used in the parallel region, you may need to switch to thread 0 and change the stack frame to the function you wrote, rather than the outline function.
- Stepping often behaves unexpectedly inside parallel regions. Reduction variables usually require some sort of locking between threads, and may even appear to make the current line jump back to the start of the parallel region. If this happens step over several times and you will see the current line comes back to the correct location.
- Some compilers optimize parallel loops regardless of the options you specified on the command line. This has many strange effects, including code that appears to move backwards as well as forwards, and variables that have nonsense values because they have been optimized out by the compiler.
- The thread IDs displayed in the Process Group Viewer and Cross-Thread Comparison window will match the value returned by omp_get_thread_num() for each thread, but only if your OpenMP implementation exposes this data to DDT. GCC's support for OpenMP (GOMP) needs to be built with TLS enabled with our thread IDs to match the return omp_get_thread_num(), whereas your system GCC most likely has this option disabled. The same thread IDs will be displayed as tooltips for the threads in the thread viewer, but only your OpenMP implementation exposes this data.
If you are using DDT with OpenMP and would like to tell us about your experiences, please contact Arm support at firstname.lastname@example.org , with the subject title OpenMP feedback.
DDT can only launch MPI programs and scalar (single process) programs itself. The Manual Launch (Advanced) button on the Welcome Page allows you to debug multi-process and multi-executable programs. These programs do not necessarily need to be MPI programs. You can debug programs that use other parallel frameworks, or both the client and the server from a client/server application in the same DDT session.
You must run each program you want to debug manually using the ddt-client command, similar to debugging with a scalar debugger like the GNU debugger (gdb). However, unlike a scalar debugger, you can debug more than one process at the same time in the same DDT session, as long as your license permits it. Each program you run will show up as a new process in the DDT window.
For example to debug both client and server in the same DDT session:
- Click the Manual Launch (Advanced) button.
- Select 2 processes
- Click the Listen button.
- At the command line run:
The server process appears as process 0 and the client as process 1 in the DDT window.
After you have run the initial programs you may add extra processes to the DDT session, for example extra clients could be added, using ddt-client in the same way.
If you check Start debugging after the first process connects you do not need to specify how many processes you want to launch in advance. You can start debugging after the first process connects and add extra processes later as above.
The easiest way to debug MPMD programs is by using Express Launch to start your application.
To use Express Launch, simply prefix your normal MPMD launch line with ddt, for example:
For more information on Express Launch, and compatible MPI implementations, see section 5.2 .
If you are using Open MPI, MPICH 2, MPICH 3 or Intel MPI, DDT can be used to debug multiple program, multiple data (MPMD) programs. To start an MPMD program in DDT:
- MPICH 2 and Intel MPI only: Select the MPMD variant of the MPI Implementation on the System page of the Options window, for example, for MPICH 2 select MPICH 2 (MPMD).
- Click the Run button on the Welcome Page.
- Select one of the MPMD programs in the Application box, it does not matter what executable you choose.
- Enter the total amount of processes for the MPMD job in the Number of processes box.
- Enter an MPMD style command line in the mpirun
Arguments box in the MPI section of the Run
window, for example:
- Click the Run button.
If you are using Open MPI in Compatibility mode, for example, because you do not have SSH access to the compute nodes, then replace:
in the mpirun Arguments / appfile with this:
DDT allows you to open one or more core files generated by your application.
To debug using core files, click the Open Core Files button on the Welcome Page. This opens the Open Core Files window, which allows you to select an executable and a set of core files. Click OK to open the core files and start debugging them.
While DDT is in this mode, you cannot play, pause or step, because there is no process active. You are, however, able to evaluate expressions and browse the variables and stack frames saved in the core files.
DDT can attach to running processes on any machine you have access to, whether they are from MPI or scalar jobs, even if they have different executables and source pathnames. Clicking the Attach to a Running Program button on the Welcome Page shows DDT's Attach Window:
There are two ways to select the processes you want to attach to: you can either choose from a list of automatically detected MPI jobs (for supported MPI implementations) or manually select from a list of processes.
DDT can automatically detect MPI jobs started on the local host for selected MPI implementations. This also applies to other hosts you have access to, if an Attach Hosts File is configured. See section A.5.1 System for more details.
The list of detected MPI jobs is shown on the Automatically-detected MPI jobs tab of the Attach Window. Click the header for a particular job to see more information about that job. Once you have found the job you want to attach to simply click the Attach button to attach to it.
You may want to attach only to a subset of ranks from your MPI job. You can choose this subset using the Attach to ranks box on the Automatically-detected MPI jobs tab of the Attach Window. You may change the subset later by selecting the File → Change Attached Processes… menu item.
You can manually select which processes to attach to from a list of processes using the List of all processes tab of the Attach Window. If you want to attach to a process on a remote host see section A.4 Connecting to remote programs (remote-exec) first.
Initially the list of processes is blank while DDT scans the nodes, provided in your node list file, for running processes. When all the nodes have been scanned (or have timed out) the window appears as shown above. Use the Filter box to find the processes you want to attach to. On non-Linux platforms you also need to select the application executable you want to attach to. Ensure that the list shows all the processes you wish to debug in your job, and no extra/unnecessary processes. You may modify the list by selecting and removing unwanted processes, or alternatively selecting the processes you wish to attach to and clicking on Attach to Selected Processes. If no processes are selected, DDT uses the whole visible list.
On Linux you may use DDT to attach to multiple processes running different executables. When you select processes with different executables the application box changes to read Multiple applications selected. DDT creates a process group for each distinct executable.
With some supported MPI implementations (for example, Open MPI) DDT shows MPI processes as children of the mpirun (or equivalent) command, as shown in the following figure. Clicking the mpirun command automatically selects all the MPI child processes.
Some MPI implementations (such as MPICH 1) create forked (child) processes that are used for communication, but are not part of your job. To avoid displaying and attaching to these, make sure the Hide Forked Children box is ticked. DDT's definition of a forked child is a child process that shares the parent's name. Some MPI implementations create your processes as children of each other. If you cannot see all the processes in your job, try clearing this checkbox and selecting specific processes from the list.
Once you click on the Attach to Selected/Listed Processes button, DDT uses remote-exec to attach a debugger to each process you selected and proceeds to debug your application as if you had started it with DDT. When you end the debug session, DDT detaches from the processes rather than terminating them, this allows you to attach again later if you wish.
DDT examines the processes it attaches to and tries to discover the MPI_COMM_WORLD rank of each process. If you have attached to two MPI programs, or a non-MPI program, then you may see the following message:
If there is no rank, for example, if you have attached to a non-MPI program, then you can ignore this message and use DDT as normal. If there is, then you can easily tell DDT what the correct rank for each process via the Use as MPI Rank button in the Cross-Process Comparison Window. See section 8.17 Assigning MPI ranks for details.
Note that the stdin, stderr and stdout (standard input, error and output) are not captured by DDT if used in attaching mode. Any input/output continues to work as it did before DDT attached to the program, for example, from the terminal or perhaps from a file.
To attach to remote hosts in DDT, click the Choose Hosts button in the attach dialog. This displays the list of hosts to be used for attaching.
From here you can add and remove hosts, as well as unchecking hosts that you wish to temporarily exclude.
The hosts list is initially populated from the attach Hosts File, which can be configured from the Options window: File → Options (Arm Forge → Preferences on Mac OS X) .
Each remote host is then scanned for processes, and the result displayed in the attach window. If you have trouble connected to remote hosts, please see section A.4 Connecting to remote programs (remote-exec).
As an alternative to starting DDT and using the Welcome Page, DDT can instead be instructed to attach to running processes from the command-line.
To do so, you need to specify a list of hostnames and process identifiers (PIDs). If a hostname is omitted then localhost is assumed.
The list of hostnames and PIDs can be given on the command-line using the --attach option:
Another command-line possibility is to specify the list of hostnames and PIDs in a file and use the --attach-file option:
mark@holly:~$ cat /home/mark/ddt/examples/hello.list
mark@holly:~$ ddt --attach-file=/home/mark/ddt/examples/hello.list
In both cases, if just a number is specified for a hostname:PID pair, then localhost: is assumed.
In most cases you can debug a job simply by putting ddt --connect in front of the existing mpiexec or equivalent command in your job script. If a GUI is running on the login node or it is connected to it via the remote client, then a message is displayed prompting you with the option to debug the job when it starts.
If DDT has been configured to be integrated with a queue/batch environment, as described in section A.2 Integration with queuing systems then you may use DDT to submit your job directly from the GUI. In this case, a Submit button is presented on the Run Window, instead of the ordinary Run button. Clicking Submit from the Run Window will display the queue status until your job starts. DDT will execute the display command every second and show you the standard output. If your queue display is graphical or interactive then you cannot use it here.
If your job does not start or you decide not to run it, click on Cancel Job. If the regular expression you entered for getting the job id is invalid or if an error is reported then DDT will not be able to remove your job from the queue. In this case it is strongly recommended that you check the job has been removed before submitting another as it is possible for a forgotten job to execute on the cluster and either waste resources or interfere with other debug sessions.
On some systems a custom 'mpirun' replacement is used to start jobs, such as mpiexec. DDT normally uses whatever the default for your MPI implementation is, so for MPICH 1 it would look for mpirun and not mpiexec. This section explains how to configure DDT to use a custom mpirun command for job start up.
There are typically two ways you might want to start jobs using a custom script, and DDT supports them both. Firstly, you might pass all the arguments on the command-line, like this:
There are several key variables in this line that DDT can fill in for you:
- The number of processes (4 in the above example).
- The name of your program (/home/mark/program/chains.exe).
- One or more arguments passed to your program (/tmp/mydata).
Everything else, like the name of the command and the format of its arguments remains constant. To use a command like this in DDT, you adapt the queue submission system described in the previous section. For this mpiexec example, the settings are as shown here:
As you can see, most of the settings are left blank. There are some differences between the Submit Command in DDT and what you would type at the command-line:
- The number of processes is replaced with NUM_PROCS_TAG.
- The name of the program is replaced by the full path to ddt-debugger.
- The program arguments are replaced by PROGRAM_ARGUMENTS_TAG.
Note, it is not necessary to specify the program name here. DDT takes care of that during its own startup process. The important thing is to make sure your MPI implementation starts ddt-debugger instead of your program, but with the same options.
The second way you might start a job using a custom mpirun replacement is with a settings file:
Where myfile.nodespec might contains something similar to the following:
DDT can automatically generate simple configuration files like this every time you run your program, you need to specify a template file. For the above example, the template file myfile.ddt would contain the following:
comp00 comp01 comp02 comp03 : DDTPATH_TAG/bin/ddt-debugger DDT_DEBUGGER_ARGUMENTS_TAG PROGRAM_ARGUMENTS_TAG
This follows the same replacement rules described above and in detail in section A.2 Integration with queuing systems. The options settings for this example might be:
Note the Submit Command and the Submission Template File in particular. DDT will create a new file and append it to the submit command before executing it. In this case what would actually be executed might be mpiexec -config /tmp/ddt-temp-0112 or similar. Therefore, any argument like -config must be last on the line, because DDT will add a file name to the end of the line. Other arguments, if there are any, can come first.
It is recommended that you read the section on queue submission, as there are many features described there that might be useful to you if your system uses a non-standard start up command.
If you do use a non-standard command, please contact Arm support at email@example.com .
The usual way of debugging a program with Arm DDT in a queue/batch environment is with Reverse Connect and let it connect back from inside the queue to the GUI. See 3.3 Reverse Connect for more details on Reverse Connect.
To do this replace your usual program invocation with a Arm DDT --connect command such as the following:
The following could also be used:
In these examples MPIEXEC is the MPI launch command, NPROCS is the number of processes to start, PROGRAM is the program to run, and ARGUMENTS are the arguments to the program.
The --once argument tells Arm DDT to exit when the session ends.
The alternative to Reverse Connect for debugging a program in a queue/batch environment is to configure Arm DDT to submit the program to the queue for you. See section 5.10 Starting a job in a queue.
Some users may wish to start Arm DDT itself from a job script that is submitted to the queue/batch environment. To do this:
- Configure Arm DDT with the correct MPI implementation.
- Disable queue submission in the Arm DDT options.
- Create a job script that starts Arm DDT using a
command such as:
Or the following:
In these examples MPIEXEC is the MPI launch command, NPROCS is the number of processes to start, PROGRAM is the program to run, and ARGUMENTS are the arguments to the program.
- Submit the job script to the queue. The --once argument tells DDT to exit when the session ends.
This is typically used for debugging embedded devices only. This should be considered as an expert mode and would not normally be used to debug an application running on a server or workstation.
To prepare for using this mode, you must first start a gdbserver on the target device. Please see https://sourceware.org/gdb/onlinedocs/gdb/Server.html for further details as invocation may be system dependent.
You may then attach to a running application either via the command line or the user interface.
To attach via the command line use:
Note that the arguments are not optional.
To attach via the user interface, select the Attach dialog on DDT's welcome page. Select the GDB Server tab and substitute the appropriate settings.
If the gdbserver has been used to launch an application, then it will have been stopped before starting the user code. In this case, add a breakpoint in the main function using the Add Breakpoint button, and then play until this is reached. After this point is reached, source code will be displayed.