The following known issues affect MAP.
- I/O metrics are not available on some systems, including Cray systems.
- CPU instruction metrics are only available on x86_64 systems.
- Thread activity is not sampled whilst a process is inside an MPI call with a duration spanning multiple samples. This can appear as 'uncategorized' (white) time in the Application activity bar when in the Pthread View. The uncategorized time will coincide with long running MPI calls.
- MAP does not support code that spawns new processes, such as fork, exec and MPI_Comm_spawn. In these cases MAP will only profile the original process.
The XALT wrapper is known to cause several issues when used in conjunction with Arm Forge, such as:
- MPI programs cannot be debugged due to a hang during start up.
- Error messages are reported relating to the permissions on qstat.
Message queue debugging does not work in Open MPI 1.8.1 to 1.8.5. This issue is fixed in Open MPI 1.8.6.
The following versions of Open MPI do not work with Arm Forge because of bugs in the Open MPI debug interface:
- Open MPI 2.1.0 to 2.1.2.
- Open MPI 3.0.0 when compiled with the Arm Compiler for HPC on Arm®;v8 (AArch64) systems.
- Open MPI 3.0.x when compiled with some versions of the GNU compiler on Arm®;v8 (AArch64) systems.
- Open MPI 3.x when compiled with some versions of IBM XLC/XLF or PGI compilers on IBM Power (PPC64le little-endian, POWER8, or POWER9) systems.
- Open MPI 3.1.0 and 3.1.1.
- Open MPI 3.x with any version of PMIx ¡ 2.
To use Open MPI 3.x with the GNU compiler on IBM Power systems, you might need to configure the Open MPI build with CFLAGS=-fasynchronous-unwind-tables. This fixes a startup bug where Arm Forge is unable to step out of MPI_Init into your main function. The startup bug is a result of a lack of debug information and optimization in the Open MPI library. If you already configure with -g then you do not need to add this extra flag. An example configure command is:
The following known issues affect CUDA:
- To debug or profile a CUDA program,
compile the program with a version of the CUDA toolkit
that matches the version of the installed CUDA driver. For
example, if the CUDA 7.5 driver is installed, then you
must use the CUDA 7.5 toolkit to compile your program.
Compiling with mismatched CUDA toolkit and CUDA driver versions will cause errors when debugging or profiling.
To force DDT to use a particular version of the CUDA debugger, set the ALLINEA_FORCE_CUDA_VERSION environment variable to a version number. For example, ALLINEA_FORCE_CUDA_VERSION=7.5 for CUDA 7.5. This may cause issues due to CUDA version incompatibilities.
- GPU profiling is only supported when using a CUDA 8.0 toolkit with a CUDA 8.0 driver.
- Cray CCE 8.1.2 OpenACC and previous releases will fail to generate debug information for local variables in accelerated regions. Please install CCE 8.1.3.
- When debugging a CUDA application, adding watchpoints on either host or kernel code is not supported.
- When debugging a CUDA application, using the Step threads together box and Run to here to step into OpenMP regions is not supported. Breakpoints can be used to stop at the desired line.
- Stepping multiple warps simultaneously (e.g. those in the same block or kernel) is not supported in CUDA 9.x. Individual warps can be stepped sequentially to achieve the same effect.
- When CUDA is set to Detect invalid accesses (memcheck), placing breakpoints in CUDA kernels is not supported.
- A driver issue in CUDA 9.1 prevents DDT from debugging CUDA GPU applications on Cray machines using Cray MPT (aprun). As a workaround launch the CUDA application outside of DDT and attach to it.
Sometimes on pressing "F1" the user guide may not display correctly. Some stale files appear to be able to corrupt the document browser. If "F1" leads to invisible documents, please remove these cached files by typing:
See also additional known issues here: