You copied the Doc URL to your clipboard.

Contents

I Arm Forge
1 Introduction to Arm Forge
1.1 Arm DDT
1.1.1 Related information
1.2 Arm MAP
1.2.1 Related information
1.3 Online resources
2 Installation
2.1 Linux installation
2.1.1 Graphical install
2.1.2 Text-mode install
2.2 Mac installation
2.3 Windows installation
2.4 License files
2.5 Workstation and evaluation licenses
2.6 Supercomputing and other floating licenses
2.7 Architecture licensing
2.7.1 Using multiple architecture licenses
3 Connecting to a remote system
3.1 Remote connections dialog
3.2 Remote launch settings
3.2.1 Remote script
3.3 Reverse Connect
3.3.1 Overview
3.3.2 Usage
3.3.3 Connection details
3.4 Treeserver or general debugging ports
3.5 Using X forwarding or VNC
4 Starting
II DDT
5 Getting started
5.1 Running a program
5.1.1 Application
5.1.2 MPI
5.1.3 OpenMP
5.1.4 CUDA
5.1.5 Memory debugging
5.1.6 Environment variables
5.1.7 Plugins
5.2 Express Launch
5.2.1 Run dialog box
5.3 remote-exec required by some MPIs
5.4 Debugging single-process programs
5.5 Debugging OpenMP programs
5.6 Manual launching of multi-process non-MPI programs
5.7 Debugging MPMD programs
5.7.1 Debugging MPMD programs without Express Launch
5.7.2 Debugging MPMD programs in Compatibility mode
5.8 Opening core files
5.9 Attaching to running programs
5.9.1 Automatically detected MPI jobs
5.9.2 Attaching to a subset of an MPI job
5.9.3 Manual process selection
5.9.4 Configuring attaching to remote hosts
5.9.5 Using DDT command-line arguments
5.10 Starting a job in a queue
5.11 Using custom MPI scripts
5.12 Starting DDT from a job script
5.13 Attaching via gdbserver
5.14 UPC
5.14.1 GCC UPC
5.14.2 Berkeley UPC
5.15 Numactl
5.15.1 MPI and SLURM
5.15.2 Non-MPI Programs
5.16 Python debugging
5.16.1 Overview
5.16.2 Prerequisites
5.16.3 Running
6 Overview
6.1 Saving and loading sessions
6.2 Source code
6.2.1 Viewing
6.2.2 Editing
6.2.3 Rebuilding and restarting
6.2.4 Committing changes
6.3 Project Files
6.3.1 Application and external code
6.4 Finding lost source files
6.5 Finding code or variables
6.5.1 Find Files or Functions
6.5.2 Find
6.5.3 Find in Files
6.6 Go To Line
6.7 Navigating through source code history
6.8 Static analysis
6.9 Version control information
7 Controlling program execution
7.1 Process control and process groups
7.1.1 Detailed view
7.1.2 Summary view
7.2 Focus control
7.2.1 Overview of changing focus
7.2.2 Process group viewer
7.2.3 Breakpoints
7.2.4 Code viewer
7.2.5 Parallel stack view
7.2.6 Playing and stepping
7.2.7 Step threads together
7.2.8 Stepping threads window
7.3 Starting, stopping and restarting a program
7.4 Stepping through a program
7.5 Stop messages
7.6 Setting breakpoints
7.6.1 Using the source code viewer
7.6.2 Using the Add Breakpoint window
7.6.3 Pending breakpoints
7.6.4 Conditional breakpoints
7.7 Suspending breakpoints
7.8 Deleting a breakpoint
7.9 Loading and saving breakpoints
7.10 Default breakpoints
7.11 Synchronizing processes
7.12 Setting a watchpoint
7.13 Tracepoints
7.13.1 Setting a tracepoint
7.13.2 Tracepoint output
7.14 Version control breakpoints and tracepoints
7.15 Examining the stack frame
7.16 Align stacks
7.17 Viewing stacks in parallel
7.17.1 Overview
7.17.2 The Parallel Stack View in detail
7.18 Browsing source code
7.19 Simultaneously viewing multiple files
7.20 Signal handling
7.20.1 Custom signal handling (signal dispositions)
7.20.2 Sending signals
8 Viewing variables and data
8.1 Sparklines
8.2 Current line
8.3 Local variables
8.4 Arbitrary expressions and global variables
8.4.1 Fortran intrinsics
8.4.2 Changing the language of an expression
8.4.3 Macros and #defined constants
8.5 Help with Fortran modules
8.6 Viewing complex numbers in Fortran
8.7 C++ STL support
8.8 Custom pretty printers
8.8.1 Example
8.9 Viewing array data
8.10 UPC support
8.11 Changing data values
8.12 Viewing numbers in different bases
8.13 Examining pointers
8.14 Multi-dimensional arrays in the Variable View
8.15 Multi-dimensional array viewer (MDA)
8.15.1 Array expression
8.15.2 Filtering by value
8.15.3 Distributed arrays
8.15.4 Advanced: how arrays are laid out in the data table
8.15.5 Auto Update
8.15.6 Comparing elements across processes
8.15.7 Statistics
8.15.8 Export
8.15.9 Visualization
8.16 Cross-process and cross-thread comparison
8.17 Assigning MPI ranks
8.18 Viewing registers
8.19 Process details
8.20 Disassembler
8.21 Interacting directly with the debugger
9 Program input and output
9.1 Viewing standard output and error
9.2 Saving output
9.3 Sending standard input
10 Logbook
10.1 Usage
10.2 Annotation
10.3 Comparison window
11 Message queues
11.1 Viewing the message queues
11.2 Interpreting the message queues
11.3 Deadlock
12 Memory debugging
12.1 Enabling memory debugging
12.2 CUDA memory debugging
12.3 Configuration
12.3.1 Static linking
12.3.2 Available checks
12.3.3 Changing settings at run time
12.4 Pointer error detection and validity checking
12.4.1 Library usage errors
12.4.2 View pointer details
12.4.3 Cross-process comparison of pointers
12.4.4 Writing beyond an allocated area
12.4.5 Fencepost checking
12.4.6 Suppressing an error
12.5 Current memory usage
12.5.1 Detecting leaks when using custom allocators/memory wrappers
12.6 Memory Statistics
13 Using and writing plugins
13.1 Supported plugins
13.2 Installing a plugin
13.3 Using a plugin
13.4 Writing a plugin
13.5 Plugin reference
14 CUDA GPU debugging
14.1 Licensing
14.2 Preparing to debug GPU code
14.3 Launching the application
14.4 Controlling GPU threads
14.4.1 Breakpoints
14.4.2 Stepping
14.4.3 Running and pausing
14.5 Examining GPU threads and data
14.5.1 Selecting GPU threads
14.5.2 Viewing GPU thread locations
14.5.3 Understanding kernel progress
14.5.4 Source code viewer
14.6 GPU devices information
14.7 Attaching to running GPU applications
14.8 Opening GPU core files
14.9 Known issues / limitations
14.9.1 Debugging multiple GPU processes
14.9.2 Thread control
14.9.3 General
14.9.4 Pre sm_20 GPUs
14.9.5 Debugging multiple GPU processes on Cray limitations
14.10 GPU language support
14.10.1 Cray OpenACC
14.10.2 PGI Accelerators and CUDA Fortran
14.10.3 IBM XLC/XLF with offloading OpenMP
15 Offline debugging
15.1 Using offline debugging
15.1.1 Reading a file for standard input
15.1.2 Writing a file from standard output
15.2 Offline report output (HTML)
15.3 Offline report output (plain text)
15.4 Run-time job progress reporting
15.4.1 Periodic snapshots
15.4.2 Signal-triggered snapshots
III MAP
16 Getting started
16.1 Express Launch
16.1.1 Run dialog box
16.2 Preparing a program for profiling
16.2.1 Debugging symbols
16.2.2 Linking
16.2.3 Dynamic linking on Cray X-Series systems
16.2.4 Static linking
16.2.5 Static linking on Cray X-Series systems
16.2.6 Dynamic and static linking on Cray X-Series systems using the modules environment
16.2.7 map-link modules installation on Cray X-Series
16.3 Profiling a program
16.3.1 Application
16.3.2 Duration
16.3.3 Metrics
16.3.4 MPI
16.3.5 OpenMP
16.3.6 Environment variables
16.3.7 Profiling
16.3.8 Profiling only part of a program
16.3.8.1 C
16.3.8.2 Fortran
16.4 remote-exec required by some MPIs
16.5 Profiling a single-process program
16.6 Sending standard input
16.7 Starting a job in a queue
16.8 Using custom MPI scripts
16.9 Starting MAP from a job script
16.10 Numactl
16.11 MAP environment variables
17 Program output
17.1 Viewing standard output and error
17.2 Displaying selected processes
17.3 Restricting output
17.4 Saving output
18 Source code
18.1 Viewing
18.2 OpenMP programs
18.3 GPU programs
18.4 Dealing with complexity: code folding
18.5 Editing
18.6 Rebuilding and restarting
18.7 Committing changes
19 Selected lines view
19.1 Limitations
19.2 GPU profiling
20 Stacks view
21 OpenMP Regions view
22 Functions view
23 Project Files view
24 Metrics View
24.1 CPU instructions
24.1.1 Per-line CPU instructions
24.2 Perf metrics
24.3 CPU time
24.4 I/O
24.5 Memory
24.6 MPI
24.7 Detecting MPI imbalance
24.8 Accelerator
24.9 Energy
24.9.1 Requirements
24.10 Lustre
24.11 Zooming
24.12 Viewing totals across processes and nodes
24.13 Custom metrics
25 PAPI metrics
25.1 Installation
25.2 PAPI config file
25.3 PAPI overview metrics
25.4 PAPI cache misses
25.5 PAPI branch prediction
25.6 PAPI floating-point
26 Main-thread, OpenMP and Pthread view modes
26.1 Main thread only mode
26.2 OpenMP mode
26.3 Pthread mode
27 Processes and cores view
28 Running MAP from the command line
28.1 Profiling MPMD programs
28.1.1 Profiling MPMD programs without Express Launch
29 Exporting profiler data in JSON format
29.1 JSON format
29.2 Activities
29.2.1 Description of categories
29.2.2 Categories available in main_thread activity
29.2.3 Categories available in openmp and pthreads activities
29.3 Metrics
29.4 Example JSON output
30 GPU profiling
30.1 Kernel analysis
30.2 Compilation
30.3 Performance impact
30.4 Customizing GPU profiling behavior
30.5 Known issues
31 Python profiling
31.1 Procedure
31.2 Results
31.3 Example: Profiling a simple Python script
31.4 Next steps
31.5 Related information
31.6 Known Issues
IV Appendix
A Configuration
A.1 Configuration files
A.1.1 Sitewide configuration
A.1.2 Startup scripts
A.1.3 Importing legacy configuration
A.1.4 Converting legacy sitewide configuration files
A.1.5 Using shared home directories on multiple systems
A.1.6 Using a shared installation on multiple systems
A.2 Integration with queuing systems
A.3 Template tutorial
A.3.1 The template script
A.3.2 Configuring queue commands
A.3.3 Configuring how job size is chosen
A.3.4 Quick restart
A.4 Connecting to remote programs (remote-exec)
A.5 Optional configuration
A.5.1 System
A.5.2 Job submission
A.5.3 Code viewer settings
A.5.4 Appearance
B Getting support
C Supported platforms
C.1 DDT
C.2 MAP
D Known issues
D.1 MAP
D.2 XALT Wrapper
D.3 MPICH 3
D.4 Open MPI
D.5 CUDA
D.6 SLURM
D.7 PGI compilers
D.8 64-bit Arm/Power platforms
D.9 F1 user guide
D.10 See also
E MPI distribution notes and known issues
E.1 Berkeley UPC
E.2 Bull MPI
E.3 Cray MPT
E.3.1 Using DDT with Cray ATP (the Abnormal Termination Process)
E.4 HP MPI
E.5 IBM PE
E.6 Intel MPI
E.7 MPC
E.7.1 MPC in the Run window
E.7.2 MPC on the command line
E.8 MPICH 1 p4
E.9 MPICH 1 p4 mpd
E.10 MPICH 2
E.11 MPICH 3
E.12 MVAPICH 2
E.13 Open MPI
E.14 Platform MPI
E.15 SGI MPT / SGI Altix
E.16 SLURM
E.17 Spectrum MPI
F Compiler notes and known issues
F.1 AMD OpenCL compiler
F.2 Arm Fortran compiler
F.3 Berkeley UPC compiler
F.4 Cray compiler environment
F.4.1 Compile scalar programs on Cray
F.5 GNU
F.5.1 GNU UPC
F.6 IBM XLC/XLF
F.7 Intel compilers
F.8 Pathscale EKO compilers
F.9 Portland Group compilers
G Platform notes and known issues
G.1 CRAY
G.2 GNU/Linux systems
G.2.1 General
G.2.2 SUSE Linux
G.2.3 Attaching
G.3 Intel Xeon
G.3.1 Enabling RAPL energy and power counters when profiling
G.4 Intel Xeon Phi (Knight's Landing)
G.5 NVIDIA CUDA
G.5.1 CUDA known issues
G.6 Arm
G.6.1 Arm®;v8 (AArch64) known issues
G.7 POWER8 and POWER9 (POWER 64-bit)
G.7.1 Supported features
G.7.2 Known issues
G.8 MAC OS X
H General troubleshooting and known issues
H.1 General troubleshooting
H.1.1 Problems starting the GUI
H.1.2 Problems reading this document
H.2 Starting a program
H.2.1 Starting scalar programs
H.2.2 Starting scalar programs with aprun
H.2.3 Starting scalar programs with srun
H.2.4 Starting multi-process programs
H.2.5 No shared home directory
H.2.6 DDT or MAP cannot find your hosts or the executable
H.2.7 The progress bar does not move and Arm Forge times out
H.3 Attaching
H.3.1 The system does not allow connecting debuggers to processes (Fedora, Ubuntu)
H.3.2 The system does not allow connecting debuggers to processes (Fedora, Red Hat)
H.3.3 Running processes do not show up in the attach window
H.4 Source Viewer
H.4.1 No variables or line number information
H.4.2 Source code does not appear when you start Arm Forge
H.4.3 Code folding does not work for OpenACC/OpenMP pragmas
H.5 Input/Output
H.5.1 Output to stderr is not displayed
H.5.2 Unwind errors
H.6 Controlling a program
H.6.1 Program jumps forwards and backwards when stepping through it
H.6.2 DDT may stop responding when using the Step Threads Together option
H.7 Evaluating variables
H.7.1 Some variables cannot be viewed when the program is at the start of a function
H.7.2 Incorrect values printed for Fortran array
H.7.3 Evaluating an array of derived types, containing multiple-dimension arrays
H.7.4 C++ STL types are not pretty printed
H.8 Memory debugging
H.8.1 The View Pointer Details window says a pointer is valid but does not show you which line of code it was allocated on
H.8.2 mprotect fails error when using memory debugging with guard pages
H.8.3 Allocations made before or during MPI_Init show up in Current Memory Usage but have no associated stack back trace
H.8.4 Deadlock when calling printf or malloc from a signal handler
H.8.5 Program runs more slowly with Memory Debugging enabled
H.9 MAP specific issues
H.9.1 My compiler is inlining functions
H.9.2 Tail call optimization
H.9.3 MPI wrapper libraries
H.9.4 Thread support limitations
H.9.5 No thread activity while blocking on an MPI call
H.9.6 I am not getting enough samples
H.9.7 I just see main (external code) and nothing else
H.9.8 MAP is reporting time spent in a function definition
H.9.9 MAP is not correctly identifying vectorized instructions
H.9.10 Linking with the static MAP sampler library fails with an undefined reference to __real_dlopen
H.9.11 Linking with the static MAP sampler library fails with FDE overlap errors
H.9.12 MAP adds unexpected overhead to my program
H.9.13 MAP takes an extremely long time to gather and analyze my OpenBLAS-linked application
H.9.14 MAP over-reports MPI, Input/Output, accelerator or synchronization time
H.9.15 MAP collects very deep stack traces with boost::coroutine
H.10 Obtaining support
I Queue template script syntax
I.1 Queue template tags
I.2 Defining new tags
I.3 Specifying default options
I.4 Launching
I.4.1 Using AUTO_LAUNCH_TAG
I.4.2 Using ddt-mpirun
I.4.3 MPICH 1 based MPI
I.4.4 Scalar programs
I.5 Using PROCS_PER_NODE_TAG
I.6 Job ID regular expression
I.7 Arm IPMI Energy Agent
I.7.1 Requirements
Was this page helpful? Yes No