You copied the Doc URL to your clipboard.

G Platform notes and known issues

This chapter notes any particular issues affecting platforms. If a supported machine is not listed in this chapter, it is because there is no known issue.

G.1 CRAY

There are a number of issues you should be aware of:

  • MAP users on Cray need to read 16.2.1 Debugging symbols and 16.2.5 Static linking on Cray X-Series systems. Arm supplies module files in FORGE_INSTALLATION_PATH/share/modules/cray.

    See 16.2.6 Dynamic and static linking on Cray X-Series systems using the modules environment.

  • Note that the default mode for compilers on this platform is to link statically. Section F.9 Portland Group compilers describes how to ensure that DDT's memory debugging capabilities will work with the PGI compilers in this mode.
  • Message queue debugging is not provided by the XT/XE/XK environment.
  • Cray GPU debugging requires a working TMPDIR to be available, if /tmp is not available. It is important that this directory is not a shared filesystem such as NFS or Lustre. To set TMPDIR for the compute nodes only use the DDT_BACKEND_TMPDIR environment variable instead. DDT will automatically propagate this environment variable to the compute nodes.
  • Running single process scalar codes, that is non-MPI/SHMEM/UPC applications, on the compute nodes requires an extra step, as these are required to be executed by aprun but aprun will not execute these via the ordinary debug-supporting protocols.

    The preferred and simple workaround is to use the .qtf templates, for example cray-slurm.qtf or cray-pbs.qtf, which handle this automatically by (for non-MPI codes) ensuring that an alternative protocol is followed. To use these qtf files, select File → Options (Arm Forge → Preferences on Mac OS X) , go to the Job Submission page and enable submission via the queue, and ensure that the Also submit scalar jobs via the queue setting is enabled. The change is to explicitly use aprun for non-MPI processes and this can be seen in the provided queue template files:

    
       if [ "MPI_TAG" == "none" ]; then 
    aprun -n 1 env AUTO_LAUNCH_TAG
    else
    AUTO_LAUNCH_TAG
    fi
  • Running a dynamically-linked single process non-MPI program that will run on a compute node, that is non-MPI CUDA or OpenACC code, will require an additional flag to the compiler: -target=native. This prevents the compiler linking in the MPI job launch routines that will otherwise interfere with debuggers on this platform. Alternatively, convert the program to an MPI one by adding MPI_Init and MPI_Finalize statements and run it as a one-process MPI job.

G.2 GNU/Linux systems

G.2.1 General

There are a number of items you should be aware of:

  • When using a 64-bit Linux please note that it is essential to use the 64-bit version of Arm Forge on this platform. This applies regardless of whether the debugged program is 32-bit or 64-bit.
  • POSIX thread cancellation does not work when running under a debugger. This is because the 'signal info' associated with a signal is lost when the signal is intercepted and sent again by the debugger, causing the cancellation request to be ignored by the receiving thread. More generally the 'signal info' associated with a signal is not available when running under a debugger.
  • Some 64-bit GNU/Linux systems which have a bug in the GNU C library, specifically libthread_db.so.1. This can crash the debugger when debugging multi-threaded programs. Check with your Linux distribution for a fix. As a workaround you can try compiling your program as a statically linked executable using the -static compiler flag.
  • For the Arm architecture breakpoints can be unreliable and will randomly be passed without stopping for some multicore processors (including the NVIDIA Tegra 2) unless a kernel option (fix) is built-in. The required kernel option is:
    
       CONFIG_ARM_ERRATA_720789=y
         
         
    

    This option is not present by default in many kernel builds.

G.2.2 Attaching

To attach to a running job:

  1. Open the Attach window by clicking on the Attach button on the Welcome page.
  2. DDT needs to know which login / batch node runjob is running on. Click the Choose Hosts… button to add the necessary login / batch node if not already present. You must be able to SSH into the login / batch node without a password.
  3. Select the Automatically-detected jobs tab. Do not use the List of processes tab.
  4. Optionally specify a subset of ranks to attach to in the Attach to processes box.
  5. Click the Attach to… button.

The following caveats apply:

  • Reattaching to a job is not supported. You may only attach to a job once.
  • No other tool must be attached, or have been attached, to the job.
  • It is possible to attach to a subset of ranks. However, because reattaching is not supported, it is not possible to subsequently change the subset.
  • It may take a little time for a job to show up in the Attach window after you submit it. If a newly started job does not show up wait a while then click Rescan nodes.

G.3 Intel Xeon

Intel Xeon processors starting with Sandy Bridge include Running Average Power Limit (RAPL) counters. MAP can use the RAPL counters to provide energy and power consumption information for your programs.

G.3.1 Enabling RAPL energy and power counters when profiling

To enable the RAPL counters to be read by MAP you must load the intel_rapl kernel module.

The intel_rapl module is included in Linux kernel releases 3.13 and later. For testing purposes Arm have backported the powercap and intel_rapl modules for older kernel releases. You may download the backported modules from:

Download backported modules

Note

These backported modules are unsupported and should be used for testing purposes only. No support is provided by Arm, your system vendor or the Linux kernel team for the backported modules.

G.4 Intel Xeon Phi (Knight's Landing)

The Intel Xeon Phi Knight's Landing platform is only supported in self-hosted mode, like an x86_64 platform.

You may experience higher than normal overhead when using MAP on this platform.

See section H.9.12 for more information.

G.5 NVIDIA CUDA

G.5.1 CUDA known issues

There are a number of issues you should be aware of:

  • DDT's memory leak reports do not track GPU memory leaks.
  • Debugging paired CPU/GPU core files is possible but is not yet fully supported.
  • CUDA metrics in MAP are not available for statically-linked programs.
  • CUDA metrics in MAP are measured at the node level, not the card level.

G.6 Arm

G.6.1 Armv8 (AArch64) known issues

There are a number of issues you should be aware of:

  • For best operation, DDT requires debug symbols for the runtime libraries to be installed in addition to debug symbols for the program itself. In particular, DDT may show the incorrect values for local variables in program code if the program is currently stopped inside a runtime library. At a minimum Arm recommends the glibc and OpenMP (if applicable) debug symbols are installed.
  • For best operation, MAP requires debug symbols for the runtime libraries to be installed in addition to debug symbols for the program itself. In particular, MAP may report time in partial traces or unknown locations without debug symbols. At a minimum Arm recommends the glibc and OpenMP (if applicable) debug symbols are installed.
  • MAP may fail to finalize a profiling session if the cores are oversubscribed on AArch64 platforms. For example, this issue is likely to occur when attempting to profile a 64 process MPI program on a machine with only 8 cores. This issue will appear as a hang after finishing a profile or after pressing the 'Stop and analyze' button.

G.7 POWER8 and POWER9 (POWER 64-bit)

G.7.1 Supported features

Split DWARF (Fission) and compressed DWARF are supported by DDT and MAP. Benefits include smaller debug information size, and potentially less memory consumption in DDT due to the ability to load debug symbols on demand. For example if you use the following flags with GCC (which requires using the Binutils Gold linker):


gcc -gdwarf-4 -gsplit-dwarf -fdebug-types-section -Wl,-fuse-ld=gold,--gdb-index,--compress-debug-sections=zlib myprogram.c
                                                                                       
                                                                                       

IBM XLC 13.1.7 requires -qdebug=NDWFSTR -gsplit-dwarf -Wl,--gdb-index,--compress-debug-sections=zlib flags. Configure the compiler to use the gold linker.

G.7.2 Known issues

Please be aware of the following:

  • For best operation, DDT and MAP require debug symbols for the runtime libraries to be installed in addition to debug symbols for the program itself. Without debug symbols, DDT may show the incorrect values for local variables in program code if the program is currently stopped inside a runtime library. Similarly, MAP may report time in partial traces or unknown locations without debug symbols. At a minimum Arm recommends the glibc and OpenMP (if applicable) debug symbols are installed. Please refer to your operating system's documentation for instructions on how to install debug symbols.

G.8 MAC OS X

The following menu items are not supported:

  • Edit → Special Characters...
  • Edit → Start Dictation
  • View → Enter Full Screen
  • View → Show Tab Bar
Was this page helpful? Yes No