You copied the Doc URL to your clipboard.

15 Offline debugging

Offline debugging is a mode of running Arm DDT in which an application is run, under the control of the debugger, but without user intervention and without a user interface.

There are many situations where running under this scenario will be useful, for example when access to a machine is not immediately available and may not be available during the working day. The application can run with features such as tracepoints and memory debugging enabled, and will produce a report at the end of the execution.

15.1 Using offline debugging

To launch Arm DDT in this mode, the --offline argument is specified. Optionally, an output filename can be supplied with the -output=<filename> argument. A filename with a .html or .htm extension will cause an HTML version of the output to be produced, in other cases a plain text report is generated. If the -output argument is not used, DDT generates an HTML output file in the current working directory and reports the name of that file upon completion.


   ddt --offline mpiexec -n 4 myprog arg1 arg2 
ddt --offline -o myjob.html mpiexec -n 4 myprog arg1 arg2
ddt --offline -o myjob.txt mpiexec -n 4 myprog arg1 arg2
ddt --offline -o myjob.html --np=4 myprog arg1 arg2
ddt --offline -o myjob.txt --np=4 myprog arg1 arg2

Additional arguments can be used to set breakpoints, at which the stack of the stopping processes will be recorded before they are continued. You can also set tracepoints at which variable values will be recorded. Additionally, expressions can be set to be evaluated on every program pause.

Settings from your current Arm DDT configuration file will be taken, unless over-ridden on the command line.

Command line options that are of the most significance for this mode of running are:

  • --session=SESSIONFILE - run in offline mode using settings saved using the Save Session option from the Arm DDT File menu.
  • --processes=NUMPROCS or -n NUMPROCS - run with NUMPROCS processes
  • --mem-debug[=(fast|balanced|thorough|off)] - enable and configure memory debugging
  • --snapshot-interval=MINUTES - write a snapshot of the program's stack and variables to the offline log file every MINUTES minutes.

    See section below.

  • --trace-at=LOCATION[,N:M:P],VAR1,VAR2,...] [if CONDITION] - set a tracepoint at location, beginning recording after the N'th visit of each process to the location, and recording every M'th subsequent pass until it has been triggered P times. Record the value of variable VAR1, VAR2. The if clause allows you to specify a boolean CONDITION that must be satisfied for the tracepoint to trigger.

    Example:

    
       main.c:22,-:2:-,x
         
         
    

    This will record x every 2nd passage of line 22.

  • --break-at=LOCATION[,N:M:P][if CONDITION] - set a breakpoint at LOCATION (either file:line or function), optionally starting after the N'th pass, triggering every M passes and stopping after it has been triggered P times. The if clause allows you to specify a boolean CONDITION that must be satisfied for the breakpoint to trigger. When using the if clause the value of this argument should be quoted.

    The stack traces of paused processes will be recorded, before the processes are then made to continue, and will contain the variables of one of the processes in the set of processes that have paused.

    Examples:

    
       --break-at=main 
    --break-at=main.c:22
    --break-at=main.c:22 --break-at=main.c:34
  • --evaluate=EXPRESSION[;EXPRESSION2][;...] - set one or more expressions to be evaluated on every program pause. Multiple expressions should be separated by a semicolon and enclosed in quotes. If shell special characters are present the value of this argument should also be quoted.

    Examples:

    
       --evaluate=i 
    --evaluate="i; (*addr) / x"
    --evaluate=i --evaluate="i * x"
  • --offline-frames=(all|none|n) - specify how many frames to collect variables for, where n is a positive integer. The default value is all.

    Examples:

    
       --offline-frames=all 
    --offline-frames=none
    --offline-frames=1337

The application will run to completion, or to the end of the job.

When errors occur, for example an application crash, the stack back trace of crashing processes is recorded to the offline output file. In offline mode, Arm DDT always acts as if the user had clicked Continue if the continue option was available in an equivalent "online" debugging session.

15.1.1 Reading a file for standard input

In offline mode, normal redirection syntax can be used to read data from a file as a source for the executable's standard input.

Examples:


   cat <input-file> | ddt --offline -o myjob.html ... 
ddt --offline -o myjob.html ... < <input-file>

15.1.2 Writing a file from standard output

Normal redirection can also be used to write data to a file from the executable's standard output:


   ddt --offline -o myjob.html ... > <output-file>

15.2 Offline report output (HTML)

The output file is broken into four sections, Messages, Tracepoints, Memory Leak Report, and Output. At the end of a job, Arm DDT merges the four sections of the log output (tracepoint data, error messages, memory leak data, and program output) into one file. If the Arm DDT process is terminated abruptly, for example by the job scheduler, then these separate files will remain and the final single HTML report may not be created. Note that a memory leak report section is only created when memory debugging is enabled.

PIC

Figure 89: Offline Mode HTML output

Timestamps are recorded with the contents in the offline log, and even though the file is neatly organized into four sections, it remains possible to identify ordering of events from the time stamp.

The Messages section contains the following:

  • Error messages: for example if Arm DDT's Memory Debugging detects an error then the message and the stack trace at the time of the error will be recorded from each offending processes.
  • Breakpoints: a message with the stopped processes and another one with the Stacks, Current Stack and Locals at this point.
  • Additional Information: after an error or a breakpoint has stopped execution, then an additional information message is recorded. This message could contain the stacks, current stack and local variables for the stopped processes at this point of the execution.
    • The Stacks table displays the parallel stacks from all paused processes. Additionally, for every top-most frame the variables (locals and arguments) will be displayed by default. You can use the --offline-frames command line option to display the variables for more frames or none. If --offline-frames=none is specified no variables at all will be displayed, instead a Locals table will show the variables for the current process. Clicking on a function expands the code snippet and variables in one go. If the stop was caused by an error or crash, the stack of the responsible thread or process is listed first.
    • The Current Stacks table shows the stack of the current process.
    • The Locals table (if --offline-frames=none) and the Variables column of the Stacks table shows the variables across the paused processes. The text highlighting scheme is the same as for the Local variables in the GUI. The Locals table shows the local variables of the current process, whereas the Variables column shows the locals for a representative process that triggered the stop in that frame. In either case a sparkline for each variable shows the distribution of values across the processes.

The Tracepoints section contains the output from tracepoints, similar to that shown in the tracepoints window in an online debugging session. This includes sparklines displaying the variable distribution.

The Memory Leak Report section displays a graphical representation of the largest memory allocations that were not freed by the end of the program:

PIC

Figure 90: Memory leak report

Each row corresponds to the memory still allocated at the end of a job on a single rank. If multiple MPI ranks are being debugged, only those with the largest number of memory allocations are shown. You can configure the number of MPI ranks shown with --leak-report-top-ranks=X.

The memory allocations on each rank are grouped by the source location that allocated them. Each colored segment corresponds to one location, identified in the legend. Clicking on a segment reveals a table of all call paths leading to that location along with detailed information about the individual memory allocations:

PIC

Figure 91: Memory leak report detail

By default all locations that contribute less than 1% of the total allocations are grouped together into the "Other" item in the legend.

This limit can be configured by setting the ALLINEA_LEAK_REPORT_MIN_SEGMENT environment variable to a percentage. For example, ALLINEA_LEAK_REPORT_MIN_SEGMENT=0.5 will only group locations with less than 0.5% of the total allocated bytes together.

In addition, only the eight largest locations are shown by default. This can be configured with the --leak-report-top-locations=Y command-line option.

The raw data may also be exported by clicking the export link.

You may find the following command line options useful:

Option

Description

--leak-report-top-ranks=X

Limit the memory leak report to the top X ranks (default 8, implies --mem-debug)

--leak-report-top-locations=Y

Limit the memory leak report to the top Y locations in each rank (default 8, implies --mem-debug)

--leak-report-top-call-paths=Z

Limit the memory leak report to the top Z call paths to each allocating function (default 8, implies --mem-debug)

Output from the application is written to the Output section. For most MPIs this will not be identifiable to a particular process, but on those MPIs that do support it, Arm DDT will report which processes have generated the output.

Identical output from the Output and Tracepoints section is, if received in close proximity and order, merged in the output, where this is possible.

15.3 Offline report output (plain text)

Unlike the offline report in HTML mode, the plain text mode does not separate the tracepoint, breakpoint, memory leak, and application output into separate sections.

Lines in the offline plain text report are identified as messages, standard output, error output, and tracepoints, as detailed in the Offline Report Output (HTML) section previously.

For example, a simple report could look like the following:


message (0-3): Process stopped at breakpoint in main (hello.c:97). 
message (0-3): Stacks
message (0-3): Processes Function
message (0-3): 0-3 main (hello.c:97)
message (0-3): Stack for process 0
message (0-3): #0 main (argc=1, argv=0x7fffffffd378, \
environ=0x7fffffffd388) at /home/ddt/examples/hello.c:97
message (0-3): Local variables for process 0 \
(ranges shown for 0-3)
message (0-3): argc: 1 argv: 0x7fffffffd378 beingWatched: 0 \
dest: 7 environ: 0x7fffffffd388 i: 0 message: ",!\312\t" \
my_r ank: 0 (0-3) p: 4 source: 0 status: t2: 0x7ffff7ff7fc0 \
tables: tag: 50 test: x: 10000 y: 12

15.4 Run-time job progress reporting

In offline mode, Arm DDT can be instructed to compile a snapshot of a job, including its stacks and variables, and update the session log with that information. This includes writing the HTML log file, which otherwise is only written once the session has completed.

Snapshots can be triggered periodically via a command-line option, or at any point in the session by sending a signal to the Arm DDT front-end.

15.4.1 Periodic snapshots

Snapshots can be triggered periodically throughout a debugging session with the command-line option --snapshot-interval=MINUTES. For example, to log a snapshot every three minutes:


   ddt --offline -o log.html --snapshot-interval=3 \ 
mpiexec -n 8 ./myprog

15.4.2 Signal-triggered snapshots

Snapshots can also be triggered by sending a SIGUSR1 signal to the DDT front-end process (called ddt.bin in process lists), regardless of whether or not the --snapshot-interval command-line option was specified. For example, after running the following:


   ddt --offline -o log.html mpiexec -n 8 ./myprog

A snapshot can be triggered by running (in another terminal):


   # Find PID of DDT front-end: 
pgrep ddt.bin
> 18032
> 18039

# Use pstree to identify the parent if there are multiple PIDs:
pstree -p

# Trigger the snapshot:
kill -SIGUSR1 18032
Was this page helpful? Yes No