You copied the Doc URL to your clipboard.

H General troubleshooting and known issues

If you have problems with Arm Forge products, the topics in this section may help you. Additionally, check the support pages on the Arm Developer website, and make sure you have the latest version of the product.

H.1 General troubleshooting

H.1.1 Problems starting the GUI

If the GUI is unable to start, this can be due to one of the following reasons:

  1. Cannot connect to an X server. If you are running on a remote machine, make sure that your DISPLAY variable is set appropriately and that you can run simple X applications such as xterm from the same command-line.
  2. The license file is invalid. In this case the software will issue an error message. You should verify that you have a license file for the correct product in the license directory and check that the date inside it is still valid. If the program still refuses to start, please contact Arm support at Arm support.
  3. You are using Licence Server, but the Arm Forge products cannot connect to it. See the Licence Server user guide for more information on troubleshooting these problems.

H.1.2 Problems reading this document

If when pressing F1 a blank screen appears instead of this document, there may be corrupt files that are preventing the documentation system (Qt Assistant) from starting. You can resolve this by removing the stale files, which are found in $HOME/.local/share/data/Allinea.

H.2 Starting a program

H.2.1 Starting scalar programs

Before attempting to start a program, check F Compiler notes and known issues and ensure it is compiled correctly.

There are a number of possible sources for problems. The most common is, for users with a multi-process license, that the Run Without MPI Support check box has not been checked. If the software reports a problem with MPI and you know your program is not using MPI, then this is usually the cause. If you have checked this box and the software still mentions MPI then please contact Arm support at Arm support.

Other potential problems are:

  • A previous Arm session is still running, or has not released resources required for the new session. Usually this can be resolved by killing stale processes. The most obvious symptom of this is a delay of approximately 60 seconds and a message stating that not all processes connected. You may also see, in the terminal, a QServerSocket message.
  • The target program does not exist or is not executable.
  • Arm Forge products' backend daemon, ddt-debugger, is missing from the bin directory. In this case you should check your installation, and contact Arm support at Arm support.

H.2.2 Starting scalar programs with aprun

For compilation, see F.4.1 Compile scalar programs on Cray. The following environment variables should be exported:

export ALLINEA_MPI_INIT=main 

Instead of setting a breakpoint in the default MPI_Init location, these environment variables set a breakpoint in main, and hold the program there.

If using compatibility launch with a scalar program, the run dialog automatically detects Cray MPI even though it is a non-MPI program. Keep MPI checked, set one process, and click run.

If the above environment variables do not work, try an alternative solution by exporting:


ALLINEA_STOP_AT_MAIN holds the program wherever it was when it attached. This can be before main. For Arm DDT, first you should set a breakpoint in the main of your program. Next, run to this breakpoint.

H.2.3 Starting scalar programs with srun

Export the following environment variables:

export ALLINEA_MPI_INIT=main 

Instead of setting a breakpoint in the default MPI_Init location, these environment variables set a breakpoint in main, and hold the program there.

If using compatibility launch with a scalar program, the run dialog automatically detects SLURM. Keep MPI checked, set one process and click run.

If the above environment variables do not work, try an alternative solution by exporting:


ALLINEA_STOP_AT_MAIN holds the program wherever it was when it attached. This can be before main. For Arm DDT, first you should set a breakpoint in the main of your program. Next, run to this breakpoint.

H.2.4 Starting multi-process programs

If you encounter problems while starting an MPI program, the first step is to establish that it is possible to run a single-process (non-MPI) program such as a trivial "Hello, World!", and resolve such issues that may arise. After this, attempt to run a multi-process job, and the symptoms will often allow a reasonable diagnosis to be made.

In the first instance verify that MPI is working correctly by running a job, without any of Arm Forge products applied, such as the example in the examples directory.

   mpirun -np 8 ./a.out

Verify that mpirun is in the PATH, or the environment variable ALLINEA_MPIRUN is set to the full pathname of mpirun.

If the progress bar does not report that at least process 0 has connected, then the remote ddt-debugger daemons cannot be started or cannot connect to the GUI.

Sometimes problems are caused by environment variables not propagating to the remote nodes while starting a job. To a large extent, the solution to these problems depends on the MPI implementation that is being used.

In the simplest case, for rsh based systems such as a default MPICH 1 installation, correct configuration can be verified by rsh-ing to a node and examining the environment. It is worthwhile rsh-ing with the env command to the node as this will not see any environment variables set inside the .profile command. For example if your nodes use a .profile instead of a .bashrc for each user then you may see a different output when running rsh node env than when you run rsh node and then run env inside the new shell.

If only one, or very few, processes connect, it may be because you have not chosen the correct MPI implementation. Please examine the list and look carefully at the options. Should no other suitable MPI be found, please contact Arm support for advice at Arm support.

If a large number of processes are reported by the status bar to have connected, then it is possible that some have failed to start due to resource exhaustion, timing out, or, unusually, an unexplained crash. You should verify again that MPI is still working, as some MPI distributions do not release all semaphore resources correctly, for example MPICH 1 on Redhat with SMP support built in.

To check for time-out problems, set the ALLINEA_NO_TIMEOUT environment variable to 1 before launching the GUI and see if further progress is made. This is not a solution, but aids the diagnosis. If all processes now start, please contact Arm for support at Arm support.

H.2.5 No shared home directory

If your home directory is not accessible to all the nodes in your cluster then your jobs may fail to start.

To resolve the problem open the file ~/.allinea/system.config in a text editor. Change the shared directory option in the [startup] section so it points to a directory that is available and shared by all the nodes. If no such directory exists, change the use session cookies option to no instead.

H.2.6 DDT or MAP cannot find your hosts or the executable

This can happen when attempting to attach to a process running on other machines. Ensure that the host name(s) that DDT complains about are reachable using ping.

If DDT fails to find the executable, ensure that it is available in the same directory on every machine.

See section A.4 Connecting to remote programs (remote-exec) for more information on configuring access to remote machines.

H.2.7 The progress bar does not move and Arm Forge times out

It is possible that the program ddt-debugger has not been started by mpirun or has aborted. You can log onto your nodes and confirm this by looking at the process list before clicking Ok when Arm Forge times out. Ensure ddt-debugger has all the libraries it needs and that it can run successfully on the nodes using mpirun.

Alternatively, there may be one or more processes (ddt-debugger, mpirun, rsh) which could not be terminated. This can happen if Arm Forge is killed during its startup or due to MPI implementation issues. You will have to kill the processes manually, using ps x to get the process ids and then kill or kill -9 to terminate them.

This issue can also arise for mpich-p4mpd, and the solution is explained in Appendix E MPI distribution notes and known issues.

If your intended mpirun command is not in your PATH, you may either add it to your PATH or set the environment variable ALLINEA_MPIRUN to contain the full pathname of the correct mpirun.

If your home directory is not accessible by all the nodes in your cluster then your jobs may fail to start in this fashion.

See section H.2.5 No shared home directory.

H.3 Attaching

H.3.1 The system does not allow connecting debuggers to processes (Fedora, Ubuntu)

The Ubuntu ptrace scope control feature does not allow a process to attach to other processes it did not launch directly.

See\#ptrace for details.

To disable this feature until the next reboot run the following command:

   echo 0 | sudo tee /proc/sys/kernel/yama/ptrace_scope

To disable it permanently, add this line to /etc/sysctl.d/10-ptrace.conf (or /etc/sysctl.conf):

   kernel.yama.ptrace_scope = 0

This will take effect after the next reboot.

On Fedora, ptrace may be blocked by SELinux in addition to Yama. See section H.3.2 .

H.3.2 The system does not allow connecting debuggers to processes (Fedora, Red Hat)

The deny_ptrace boolean in SELinux (used by Fedora and Red Hat) does not allow a process to attach to other processes it did not launch directly.

See for details.

To disable this feature until the next reboot run the following command:

   setsebool deny_ptrace 0

To disable it permanently run this command:

   setsebool -P deny_ptrace 0

As of Fedora 22, ptrace may be blocked by Yama in addition to the SELinux boolean. See section H.3.1 .

H.3.3 Running processes do not show up in the attach window

Running processes that do not show up in the attach window is usually a problem with either your remote-exec script or your node list file.

First check that the entry in your node list file corresponds with either localhost (if you are running on your local machine) or with the output of hostname on the desired machine.

Secondly try running /path/to/arm/forge/libexec/remote-exec manually.

For example, /path/to/arm/forge/libexec/remote-exec<hostname>ls. Then check the output of this.

If this fails then there is a problem with your remote-exec script. If rsh is still being used in your script check that you can rsh to the desired machine. Otherwise check that you can attach to your machine in the way specified in the remote-exec script.

See also A.4 Connecting to remote programs (remote-exec).

If you still experience problems with your script then contact Arm support for assistance at Arm support.

H.4 Source Viewer

H.4.1 No variables or line number information

You should compile your programs with debug information included, this flag is usually -g.

H.4.2 Source code does not appear when you start Arm Forge

If you cannot see any text at all, perhaps the default selected font is not installed on your system. Go to File → Options (Arm Forge → Preferences on Mac OS X) and choose a fixed width font such as Courier and you should now be able to see the code.

If you see a screen of text telling you that Arm Forge could not find your source files, follow the instructions given. If you still cannot see your source code, check that the code is available on the same machine as you are running the software on, and that the correct file and directory permissions are set. If some files are missing, and others found, try adding source directories and rescanning for further instruction.

If the problem persists, contact Arm support at Arm support.

H.4.3 Code folding does not work for OpenACC/OpenMP pragmas

This is a known issue. If an OpenACC or OpenMP pragma is associated with a multi-line loop, then the loop block may be folded instead.

H.5 Input/Output

H.5.1 Output to stderr is not displayed

Arm Forge automatically captures anything written to stdout / stderr and display it.

Some shells, such as csh, do not support this feature in which case you may see your stderr mixed with stdout, or you may not see it at all.

In any case Arm strongly recommends writing program output to files instead, since the MPI specification does not cover stdout / stderr behavior.

H.5.2 Unwind errors

When using MAP you may see errors reported in the output of the form:

Arm Sampler: 3 libunwind: Unspecified (general) error (4/172 samples) 
Arm Sampler: 3 Maximum backtrace size in sampler exceeded, stack too deep. (1/172 samples)

These indicate that MAP was only able to obtain a partial stack trace for the sample. If the proportion of samples that generate such errors is low, then they can safely be ignored.

If a large proportion of samples exhibit these errors, then consult the advice on partial traces in F.7 Intel compilers or F.9 Portland Group compilers if you are using these compilers.

If this does not help, then please contact Arm support at Arm support.

H.6 Controlling a program

H.6.1 Program jumps forwards and backwards when stepping through it

If you have compiled with any sort of optimisations, the compiler will shuffle your programs instructions into a more efficient order. This is what you are seeing. Arm recommends compiling with -O0 when debugging, which disables this behavior and other optimisations.

If you are using the Intel OpenMP compiler, then the compiler will generate code that appears to jump in and out of the parallel blocks regardless of your -O0 setting. Stepping inside parallel blocks is therefore not recommended.

H.6.2 DDT may stop responding when using the Step Threads Together option

DDT may stop responding if a thread exits when the Step Threads Together option is enabled. This is most likely to occur on Linux platforms using NPTL threads. This might happen if you tried to Play to here to a line that was never reached, in which case your program ran all the way to the end and then exited.

A workaround is to set a breakpoint at the last statement executed by the thread and turn off Step Threads Together when the thread stops at the breakpoint.

If this problem affects you please contact Arm support at Arm support.

H.7 Evaluating variables

H.7.1 Some variables cannot be viewed when the program is at the start of a function

Some compilers produce faulty debug information, forcing DDT to enter a function during the prologue or the variable may not yet be in scope.

In this region, which appears to be the first line of the function, some variables have not been initialized yet. To view all the variables with their correct values, it may be necessary to play or step to the next line of the function.

H.7.2 Incorrect values printed for Fortran array

Pointers to non-contiguous array blocks, allocatable arrays using strides, are not supported.

If this issue affects you, please contact Arm support at Arm supportfor a workaround or fix.

There are also many compiler limitations that can cause this. See Appendix F for details.

H.7.3 Evaluating an array of derived types, containing multiple-dimension arrays

The Locals, Current Line and Evaluate views may not show the contents of these multi-dimensional arrays inside an array of derived types.

However, you can view the contents of the array by clicking on its name and dragging it into the evaluate window as an item on its own, or by using the MDA.

H.7.4 C++ STL types are not pretty printed

The pretty printers provided with DDT are compatible with GNU compilers version 4.7 and above, and Intel C++ version 12 and above.

H.8 Memory debugging

H.8.1 The View Pointer Details window says a pointer is valid but does not show you which line of code it was allocated on

The Pathscale compilers have known issues that can cause this.

Please see the compiler notes in section C of this appendix for more details.

The Intel compiler may need the -fp argument to allow you to see stack traces on some machines.

If this happens with another compiler, please contact Arm support at Arm support with the vendor and version number of your compiler.

H.8.2 mprotect fails error when using memory debugging with guard pages

This can happen if your program makes more than 32768 allocations; a limit in the kernel prevents DDT from allocating more protected regions than this. Your options are:

  • Running echo 123456 >/proc/sys/vm/max_map_count (requires root) will increase the limit to 61728 (123456 / 2, as some allocations use multiple maps).
  • Disable guard pages completely. This will hinder DDT's ability to detect heap over/underflows.
  • Disable guard pages temporarily. You can disable them at program start, add a breakpoint before the allocations you wish to add guard pages for, and then reenable the feature.

See 12.3 Configuration for information on how to disable guard pages.

H.8.3 Allocations made before or during MPI_Init show up in Current Memory Usage but have no associated stack back trace

Memory allocations that are made before or during MPI_Init appear in Current Memory Usage along with any allocations made afterwards.

However, the call stack at the time of the allocation is not recorded for these allocations and will not show up in the Current Memory Usage window.

H.8.4 Deadlock when calling printf or malloc from a signal handler

The memory allocation library calls (for example, malloc) provided by the memory debugging library are not async-signal-safe unlike the implementations in recent versions of the GNU C library.

POSIX does not require malloc to be async-signal-safe but some programs may expect this behavior.

For example, a program that calls printf from a signal handler may deadlock when memory debugging is enabled in DDT since the C library implementation of printf may call malloc.

The web page below has a table of the functions that may be safely called from an asynchronous signal handler:

H.8.5 Program runs more slowly with Memory Debugging enabled

The Memory Debugging library performs more checks than the normal runtime's memory allocation routines.

However, those checks also makes the library slower.

If your program is running too slow when Memory Debugging is enabled there are a number of options you can change to speed it up.

Firstly try reducing the Heap Debugging option to a lower setting. For example, if it is currently on High, try changing it to Medium or Low.

You can increase the heap check interval from the default of 100 to a higher value. The heap check interval controls how many allocations may occur between full checks of the heap, which may take some time.

A higher setting (1000 or above) is recommended if your program allocates and deallocates memory very frequently, for example from inside a computation loop.

You can disable the Store backtraces for memory allocations option, at the expense of losing backtraces in the View Pointer Details and Current Memory Usage windows.

H.9 MAP specific issues

H.9.1 My compiler is inlining functions

While compilers may inline functions, their ability to include sufficient information to reconstruct the original call tree varies between vendors. Arm recommends using the following flags:

  • Intel: -g -O3 -fno-inline-functions
  • Intel 17+: -g -fno-inline -no-ip -no-ipo -fno-omit-frame-pointer -O3
  • PGI: -g -O3 -Meh_frame
  • GNU: -g -O3 -fno-inline
  • Cray: -G2 -O3 -h ipa0
  • IBM: -g -O3 -qnoinline

Be aware that some compilers may still inline functions even when explicitly asked not to.

There is typically some small performance penalty for disabling function inlining or enabling profiling information.

MAP will work fine, but you will often see time inside an inlined function being attributed to its parent in the Stacks view. The Source Code view should be largely unaffected.

H.9.2 Tail call optimization

A function may return the result of calling another function, for example:

int someFunction() 
return otherFunction();

In this case the compiler may change the call to otherFunction into a jump. This means that, when inside otherFunction, the calling function, someFunction, no longer appears on the stack.

This optimization is called tail recursion optimization. It may be disabled for the GNU C compiler by passing the -fno-optimize-sibling-calls argument to gcc.

H.9.3 MPI wrapper libraries

Unlike DDT, MAP wraps MPI calls in a custom shared library. A precompiled wrapper is copied that is compatible with your system, or one is built for your system each time you run MAP.

See section C.2 MAP for the list of supported MPIs.

You can also try setting ALLINEA_WRAPPER_COMPILE=1 and MPICC directly:

   $ MPICC=my-mpicc-command bin/map -n 16 ./wave_c

If you have problems please contact Arm support at Arm support.

H.9.4 Thread support limitations

MAP provides limited support for programs when threading support is set to MPI_THREAD_SERIALIZED or MPI_THREAD_MULTIPLE in the call to MPI_Init_thread.

MPI activity on non-main threads will contribute towards the MPI-time of the program, but not the MPI metric graphs.

Additionally, MPI activity on a non-main thread may result in additional profiling overhead due to the mechanism employed by MAP for detecting MPI activity.

It is recommended that the pthread view mode is used for interpreting MPI activity instead of the OpenMP view mode, since OpenMP view mode will scale MPI-time depending on the resources requested. Hence, non-main thread MPI activity may provide nonintuitive results when detected outside of OpenMP regions.

Warnings are displayed when the user initiates and completes profiling a program which sets MPI_THREAD_SERIALIZED or MPI_THREAD_MULTIPLE as the required thread support.

MAP does fully support calling MPI_Init_thread with either MPI_THREAD_SINGLE or MPI_THREAD_FUNNELED specified as the required thread support.

It should be noted that the requirements that the MPI specification make on programs using MPI_THREAD_FUNNELED are the same as made by MAP: all MPI calls must be made on the thread that called MPI_Init_thread .

In many cases, multi-threaded MPI programs can be refactored such that they comply with this restriction.

H.9.5 No thread activity while blocking on an MPI call

Unfortunately MAP is currently unable to record thread activity on a process where a long-duration MPI call is in progress.

If you have an MPI call that takes a significant amount of time to complete, as indicated by a sawtooth on the MPI call duration metric graph, MAP will display no thread activity for the process executing that call for most of that MPI call's duration.

See also section 24.6 .

H.9.6 I am not getting enough samples

By default starting sampling interval is every 20ms, but if you get warnings about too few samples on a fast run, or want more detail in the results, you can change the sampling rate.

To increase the interval to every 10ms set environment variable ALLINEA_SAMPLER_INTERVAL=10.

Note: Sampling frequency is automatically decreased over time to ensure a manageable amount of data is collected whatever the length of the run.

Increasing the sampling frequency is not recommended if there are lots of threads and/or very deep stacks in the target program as this may not leave sufficient time to complete one sample before the next sample is started.

Note: Whether OpenMP is enabled or disabled in MAP, the final script or scheduler values set for OMP_NUM_THREADS will be used to calculate the sampling interval per thread (ALLINEA_SAMPLER_INTERVAL_PER_THREAD). When configuring your job for submission, check whether your final submission script, scheduler or the MAP GUI has a default value for OMP_NUM_THREADS.

Note: Custom values for ALLINEA_SAMPLER_INTERVAL will be overwritten by values set from the combination of ALLINEA_SAMPLER_INTERVAL_PER_THREAD and the expected number of threads (from OMP_NUM_THREADS).

H.9.7 I just see main (external code) and nothing else

This can happen if you compile without -g. It can also happen if you move the executable out of the directory it was compiled in.

Check your compile line includes -g and try right-clicking on the Project Files panel in MAP and choosing Add Source Directory….

Contact Arm support at Arm support if you have any further issues.

H.9.8 MAP is reporting time spent in a function definition

Any overheads involved in setting up a function call (pushing arguments to the stack and so on) are usually assigned to the function definition. Some compilers may assign them to the opening brace '{' and closing brace '}' instead.

If this function has been inlined, the situation becomes further complicated and any setup time, such as for allocating space for arrays, is often assigned to the definition line of the enclosing function.

H.9.9 MAP is not correctly identifying vectorized instructions

The instructions identified as vectorized (packed) are listed here:

  • Packed floating-point instructions: addpd addps addsubpd addsubps andnpd andnps andpd andps divpd divps dppd dpps haddpd haddps hsubpd hsubps maxpd maxps minpd minps mulpd mulps rcpps rsqrtps sqrtpd sqrtps subpd subps
  • Packed integer instructions: mpsadbw pabsb pabsd pabsw paddb paddd paddq paddsb paddsw paddusb paddusw paddw palignr pavgb pavgw phaddd phaddsw phaddw phminposuw phsubd phsubsw phsubw pmaddubsw pmaddwd pmaxsb pmaxsd pmaxsw pmaxub pmaxud pmaxuw pminsb pminsd pminsw pminub pminud pminuw pmuldq pmulhrsw pmulhuw pmulhw pmulld pmullw pmuludq pshufb pshufw psignb psignd psignw pslld psllq psllw psrad psraw psrld psrlq psrlw psubb psubd psubq psubsb psubsw psubusb psubusw psubw

Arm also identifies the AVX-2 variants of these instructions, with a "v" prefix.

Contact Arm support at Arm support if you believe your code contains vectorized instructions that have not been listed and are not being identified in the CPU floating-point/integer vector metrics.

H.9.10 Linking with the static MAP sampler library fails with an undefined reference to __real_dlopen

When linking with the static MAP sampler library you may get undefined reference errors similar to the following:

../lib/64/libmap-sampler.a(dl.o): In function '__wrap_dlopen': 
../lib/64/libmap-sampler.a(dl.o): In function '__wrap_dlclose':
collect2: ld returned 1 exit status

To avoid these errors follow the instructions in section 16.2.4 Static linking.

Note the use of the -Wl,@/home/user/myprogram/allinea-profiler.ld syntax.

H.9.11 Linking with the static MAP sampler library fails with FDE overlap errors

When linking with the static MAP sampler library you may get FDE overlap errors similar to:

ld: .eh_frame_hdr table[791] FDE at 0000000000822830 overlaps table[792] FDE at 0000000000825788

This can occur when the version of binutils on a system has been upgraded to 2.25 or later and is most common seen on Cray machines using CCE 8.5.0 or higher.

To fix this issue rerun make-profiler-libraries --lib-type=static and use the freshly generated static libraries and allinea-profiler.ld to link these with your program.

See section 16.2.4 Static linking for more details.

If you are not using a Cray or SUSE build of Arm Forge and you require a binutils 2.25 compatible static library please contact Arm support at Arm support.

The error message occurs because the version of libmap-sampler.a you attempted to link was not compatible with the version of ld in binutils versions ≥ 2.25.

For Cray machines there is a separate library libmap-sampler-binutils-2.25.a provided for use with this updated linker.

The make-profiler-libraries script will automatically select the appropriate library to use based on the version of ld found in your PATH.

If you erroneously attempt to link libmap-sampler-binutils-2.25.a with your program using a version of ld prior to 2.25 you will get errors such as:

/usr/bin/ld.x: libmap-sampler.a(dl.o): invalid relocation type 42

If this happens check that the correct version of ld is in your PATH and rerun make-profiler-libraries --lib-type=static.

H.9.12 MAP adds unexpected overhead to my program

MAP's sampler library will add a little overhead to the execution of your program. Usually this is less than 5% of the wall clock execution time.

Under some circumstances, however, the overhead may exceed this, especially for short runs. This is particularly likely if your program has high OpenMP overhead, for example, if it is greater than 40%.

In this case the measurements reported by MAP will be affected by this overhead and therefore less reliable. Increasing the run time of your program for example, by changing the size of the input, decreases the overall overhead, although the initial few minutes still incurs the higher overhead.

At high per-process thread counts, for example on the Intel Xeon Phi (Knight's Landing), MAP's sampler library may incur a more significant overhead.

By default, when MAP detects a large number of threads it will automatically reduce the sampling interval in order to limit the performance impact.

Sampling behavior can be modified by setting the ALLINEA_SAMPLER_INTERVAL and ALLINEA_SAMPLER_INTERVAL_PER_THREAD environment variables. For more information on the use of these environment variables, see 16.11 .

H.9.13 MAP takes an extremely long time to gather and analyze my OpenBLAS-linked application

OpenBLAS versions 0.2.8 and earlier incorrectly stripped symbols from the .symtab section of the library, causing binary analysis tools such as MAP and objdump to see invalid function lengths and addresses.

This causes MAP to take an extremely long time disassembling and analyzing apparently overlapping functions containing millions of instructions.

A fix for this was accepted into the OpenBLAS codebase on October 8th 2013 and versions 0.2.9 and above should not be affected.

To work around this problem without updating OpenBLAS, simply run strip libopenblas*.so, this removes the incomplete .symtab section without affecting the operation or linkage of the library.

H.9.14 MAP over-reports MPI, Input/Output, accelerator or synchronization time

MAP employs a heuristic to determine which function calls should be considered as MPI operations.

If your code defines any function that starts with MPI_ (case insensitive) those functions will be treated as part of the MPI library resulting in the time spent in MPI calls to be over-reported by the activity graphs and the internals of those functions to be omitted from the Parallel Stack View.

Starting your functions names with the prefix MPI_ should be avoided and is in fact explicitly forbidden by the MPI specification. This is described on page 19 sections 2.6.2 and 2.6.3 of the MPI 3 specification document

All MPI names have an MPI_ prefix, and all characters are capitals. Programs must not declare names, for example, for variables, subroutines, functions, parameters, derived types, abstract interfaces, or modules, beginning with the prefix MPI_.

Similarly MAP categorizes I/O functions and accelerator functions by name.

Other prefixes to avoid starting your function names with include PMPI_, _PMI_, OMPI_, omp_, GOMP_, shmem_, cuda_, __cuda, cu[A-Z][a-z] and allinea_.

All of these prefixes are case-insensitive.

Also avoid naming a function start_pes or any name also used by a standard I/O or synchronization function, write, open, pthread_join, sem_wait and so on.

H.9.15 MAP collects very deep stack traces with boost::coroutine

A known bug in Boost ( prevents MAP from unwinding the call stack correctly.

This can be worked around by applying the patch attached to the bug report to your boost installation, or by specifying a manual stack allocator that correctly initializes the stack frame.

First add the following custom stack allocator:

#include <boost/coroutine/coroutine.hpp> 
#include <boost/coroutine/stack_context.hpp>

struct custom_stack_allocator {
void allocate(
boost::coroutines::stack_context & ctx,
std::size_t size ) {

void * limit = std::malloc( size);
if ( ! limit)
throw std::bad_alloc();

//Fix. RBP in the 1st frame of the stack will contain 0
const int fill=0;

std::size_t stack_hdr_size=0x100;
if (size<stack_hdr_size)
memset(static_cast< char * >(limit)+size-stack_hdr_size,

ctx.size = size;
ctx.sp = static_cast< char * >( limit) + ctx.size;

void deallocate( boost::coroutines::stack_context & ctx) {
void * limit = static_cast< char * >( ctx.sp) - ctx.size;
std::free( limit);

Then modify your program to use the custom allocator whenever a coroutine is created:

boost::coroutines::coroutine<int()>  my_coroutine(<func>, 

For more information, see the boost::coroutine documentation on stack allocators for your version of Boost.

H.10 Obtaining support

If this guide has not helped you, then the next step is to contact Arm support with a detailed report.

If possible, obtain a log file for the problem.To generate a log file, either check the Help → Logging → Automatic menu option or start Forge with the --debug and --log arguments:

$ ddt --debug --log=<logfilename> 
$ map --debug --log=<logfilename>

Where <logfilename> is the name of the log file to generate.

Next, reproduce the problem using as few processors and commands as possible. Once finished, close the program as usual.

On some systems this file may be quite large. If so, please compress it using a program such as gzip or bzip2 before sending it to support.

If your problem can only be replicated on large process counts, then please do not use the Help → Logging → Debug menu item or --debug argument as this will generate very large log files. Instead use the Help → Logging → Standard menu option or just the --log argument.

If you are connecting to a remote system, then the log file is generated on the remote host and copied back to the client when the connection is closed. The copy will not happen if the target application crashes or the network connection is lost.

In these cases, the remote copy of the log file can be found in the tmp subdirectory of the Arm configuration directory for the remote user account. The directory is ~/.allinea, unless overridden by the ALLINEA_CONFIG_DIR environment variable.

Sometimes it may be helpful to illustrate your problem with a screenshot of Arm Forge's main window. To take a screenshot, choose the Take Screenshot… option under the Window menu. You will be prompted for a file name to save the screenshot to.

Was this page helpful? Yes No