You copied the Doc URL to your clipboard.

17 Getting started

Arm MAP is a source-level profiler and can show how much time was spent on each line of code. To see the source code in MAP, compile your program with the debug flag, which for the most compilers this is -g. Do not use a debug build as you should always keep optimization flags turned on when profiling.

You can also use MAP on programs without debug information. In this case inlined functions are not shown and the source code cannot be shown but other features should work as expected.

To start MAP simply type one of the following shell commands into a terminal window:


   map 
map program_name [arguments]
map <profile-file>

Where <profile-file> is a profile file generated by a MAP profiling run. It contains the program name and has a '.map' extension.

Note

When starting MAP for examining an existing profile file, a valid license is not needed.

Note

Unless you are using Express Launch mode (see 17.1 Express Launch), you should not attempt to pipe input directly to MAP. For information about how to achieve the effect of sending input to your program, please read section 9 Program input and output.

It is also recommended you add the --profile argument to MAP. This runs without the interactive GUI and saves a .map file to the current directory and is ideal for profiling jobs submitted to a queue.

Once started in interactive mode, MAP displays the Welcome Page:

PIC

Figure 92: MAP Welcome Page

Note

In Express Launch mode (see 17.1 Express Launch) the Welcome Page is not shown and the user is brought directly to the Run Dialog instead. If no valid license is found, the program is exited and the appropriate message is shown in the console output.

The Welcome Page allows you to choose what kind of profiling you want to do. You can choose from the following:

  • Profile a program.
  • Load a Profile from a previous run.
  • Connect to a remote system and accept a Reverse Connect request.

17.1 Express Launch

Each of the Arm Forge products can be launched by typing its name in front of an existing mpiexec command:


   $ map mpiexec -n 256 examples/wave_c 30

This startup method is called Express Launch and is the simplest way to get started. If your MPI is not yet supported in this mode, you will see a error message similar to the following:


   $ 'MPICH 1 standard' programs cannot be started using Express Launch syntax (launching with an mpirun command). 

Try this instead:
map --np=256 ./wave_c 20

Type map --help for more information.

This is referred to as Compatibility Mode, in which the mpiexec command is not included and the arguments to mpiexec are passed via a --mpiargs="args here" parameter.

One advantage of Express Launch mode is that it is easy to modify existing queue submission scripts to run your program under one of the Arm Forge products. This works best for MAP, which gathers data without an interactive GUI (map --profile) or Reverse Connect (map --connect, see 3.3 Reverse Connect for more details) for interactive profiling.

If you can not use Reverse Connect and wish to use interactive profiling from a queue you may need to configure MAPto generate job submission scripts for you. More details on this can be found in 17.7 Starting a job in a queue and A.2 Integration with queuing systems.

The following lists the MPI implementations supported by Express Launch:

  • bullx MPI
  • Cray X-Series (MPI/SHMEM/CAF)
  • Intel MPI
  • MPICH 2
  • MPICH 3
  • Open MPI (MPI/SHMEM)
  • Oracle MPT
  • Open MPI (Cray XT/XE/XK)
  • Cray XT/XE/XK (UPC)

17.1.1 Run dialog box

In Express Launch mode, the Run dialog has a restricted number of options:

PIC

Figure 93: Express Launch MAP Run dialog box

17.2 Preparing a program for profiling

In most cases, if your program is already compiled with debugging symbols then you do not need to recompile your program to use it with MAP, although in some cases it may need to be relinked, as explained in section 17.2.2 Linking.

17.2.1 Debugging symbols

Arm MAP is a source-level profiler and can show how much time was spent on each line of code. To see the source code in MAP, compile your program with the debug flag, for example:


   mpicc hello.c -o hello -g -O3

Do not just use a debug build. You should always keep optimization flags turned on when profiling.

You can also use MAP on programs without debug information. In this case inlined functions are not shown and the source code cannot be shown but other features will work as expected.

Cray compiler

For the Cray compiler Arm recommends using the -G2 option with MAP.

CUDA programs

When compiling CUDA kernels do not generate debug information for device code (the -G or --device-debug flag) as this can significantly impair runtime performance. Use -lineinfo instead, for example:


    nvcc device.cu -c -o device.o -g -lineinfo -O3

Arm®;v8 (AArch64) machines

Unwind information is not always compiled in by default on this platform. This may result in partial trace nodes being displayed in the the MAP parallel stack view. To avoid this, programs that are not compiled with debug information (-g) should at least be compiled with the -fasynchronous-unwind-tables flag or the -funwind-tables flag, preferably the former.

17.2.2 Linking

To collect data from your program, MAP uses two small profiler libraries, map-sampler and map-sampler-pmpi. These must be linked with your program. On most systems MAP can do this automatically without any action by you. This is done via the system's LD_PRELOAD mechanism, which allows an extra library into your program when starting it.

Note

Although these libraries contain the word 'map' they are used for both Arm Performance Reports and Arm MAP.

This automatic linking when starting your program only works if your program is dynamically-linked. Programs may be dynamically-linked or statically-linked, and for MPI programs this is normally determined by your MPI library. Most MPI libraries are configured with --enable-dynamic by default, and mpicc/mpif90 produce dynamically-linked executables that MAP can automatically collect data from.

The map-sampler-pmpi library is a temporary file that is precompiled and copied or compiled at runtime in the directory ~/.allinea/wrapper.

If your home directory will not be accessible by all nodes in your cluster you can change where the map-sampler-pmpi library will be created by altering the shared directory as described in H.2.3 No shared home directory.

The temporary library will be created in the .allinea/wrapper subdirectory to this shared directory.

For Cray X-Series Systems the shared directory is not applicable, instead map-sampler-pmpi is copied into a hidden .allinea sub-directory of the current working directory.

If MAP warns you that it could not pre-load the sampler libraries, this often means that your MPI library was not configured with --enable-dynamic, or that the LD_PRELOAD mechanism is not supported on your platform. You now have three options:

  1. Try compiling and linking your code dynamically. On most platforms this allows MAP to use the LD_PRELOAD mechanism to automatically insert its libraries into your application at runtime.
  2. Link MAP's map-sampler and map-sampler-pmpi libraries with your program at link time manually.

    See 17.2.3 Dynamic linking on Cray X-Series systems, or 17.2.4 Static linking and 17.2.5 Static linking on Cray X-Series systems.

  3. Finally, it may be that your system supports dynamic linking but you have a statically-linked MPI. You can try to recompile the MPI implementation with --enable-dynamic, or find a dynamically-linked version on your system and recompile your program using that version. This will produce a dynamically-linked program that MAP can automatically collect data from.

17.2.3 Dynamic linking on Cray X-Series systems

If the LD_PRELOAD mechanism is not supported on your Cray X-Series system, you can try to dynamically link your program explicitly with the MAP sampling libraries.

Compiling the Arm MPI Wrapper Library

First you must compile the Arm MPI wrapper library for your system using the make-profiler-libraries --platform=cray --lib-type=shared command.

Note

Performance Reports also uses this library.


    user@login:~/myprogram$ make-profiler-libraries --platform=cray --lib-type=shared 

Created the libraries in /home/user/myprogram:
libmap-sampler.so (and .so.1, .so.1.0, .so.1.0.0)
libmap-sampler-pmpi.so (and .so.1, .so.1.0, .so.1.0.0)

To instrument a program, add these compiler options:
compilation for use with MAP - not required for Performance Reports:
-g (or '-G2' for native Cray Fortran) (and -O3 etc.)
linking (both MAP and Performance Reports):
-dynamic -L/home/user/myprogram -lmap-sampler-pmpi -lmap-sampler -Wl,--eh-frame-hdr

Note: These libraries must be on the same NFS/Lustre/GPFS filesystem as your
program.

Before running your program (interactively or from a queue), set
LD_LIBRARY_PATH:
export LD_LIBRARY_PATH=/home/user/myprogram:$LD_LIBRARY_PATH
map ...
or add -Wl,-rpath=/home/user/myprogram when linking your program.

Linking with the Arm MPI Wrapper Library

    mpicc -G2 -o hello hello.c -dynamic -L/home/user/myprogram \ 
-lmap-sampler-pmpi -lmap-sampler -Wl,--eh-frame-hdr

PGI Compiler

When linking OpenMP programs you must pass the -Bdynamic command line argument to the compiler when linking dynamically.

When linking C++ programs you must pass the -pgc++libs command line argument to the compiler when linking.

17.2.4 Static linking

If you compile your program statically, that is your MPI uses a static library or you pass the -static option to the compiler, then you must explicitly link your program with the Arm sampler and MPI wrapper libraries.

Compiling the Arm MPI Wrapper Library

First you must compile the Arm MPI wrapper library for your system using the make-profiler-libraries --lib-type=static command.

Note

Performance Reports also uses this library.


   user@login:~/myprogram$ make-profiler-libraries --lib-type=static 

Created the libraries in /home/user/myprogram:
libmap-sampler.a
libmap-sampler-pmpi.a

To instrument a program, add these compiler options:
compilation for use with MAP - not required for Performance Reports:
-g (and -O3 etc.)
linking (both MAP and Performance Reports):
-Wl,@/home/user/myprogram/allinea-profiler.ld ... EXISTING_MPI_LIBRARIES
If your link line specifies EXISTING_MPI_LIBRARIES (e.g. -lmpi), then
these must appear *after* the Arm sampler and MPI wrapper libraries in
the link line. There's a comprehensive description of the link ordering
requirements in the 'Preparing a Program for Profiling' section of either
userguide-forge.pdf or userguide-reports.pdf, located in
/opt/arm/forge/doc/.

Linking with the Arm MPI Wrapper Library

The -Wl,@/home/user/myprogram/allinea-profiler.ld syntax tells the compiler to look in /home/user/myprogram/allinea-profiler.ld for instructions on how to link with the Arm sampler. Usually this is sufficient, but not in all cases. The rest of this section explains how to manually add the Arm sampler to your link line.

PGI Compiler

When linking C++ programs you must pass the -pgc++libs command line argument to the compiler when linking.

The PGI compiler must be 14.9 or later. Using earlier versions of the PGI compiler will fail with an error such as "Error: symbol 'MPI_F_MPI_IN_PLACE' can not be both weak and common" due to a bug in the PGI compiler's weak object support.

If you do not have access to PGI compiler 14.9 or later try compiling and the linking Arm MPI wrapper as a shared library as described in 17.2.3 Dynamic linking on Cray X-Series systems Ommit the option --platform=cray if you are not on a Cray.

Cray

When linking C++ programs you may encounter a conflict between the Cray C++ runtime and the GNU C++ runtime used by the MAP libraries with an error similar to the one below:


   /opt/cray/cce/8.2.5/CC/x86-64/lib/x86-64/libcray-c++-rts.a(rtti.o): In function '__cxa_bad_typeid': 
/ptmp/ulib/buildslaves/cfe-82-edition-build/tbs/cfe/lib_src/rtti.c:1062: multiple definition of '__cxa_bad_typeid'
/opt/gcc/4.4.4/snos/lib64/libstdc++.a(eh_aux_runtime.o):/tmp/peint/gcc/repackage/4.4.4c/BUILD/snos_objdir/x86_64-suse-linux/libstdc++-v3/libsupc++/../../../../xt-gcc-4.4.4/libstdc++-v3/libsupc++/eh_aux_runtime.cc:46: first defined here

You can resolve this conflict by removing -lstdc++ and -lgcc_eh from allinea-profiler.ld.

-lpthread

When linking -Wl,@allinea-profiler.ld must go before the -lpthread command line argument if present.

Manual Linking

When linking your program you must add the path to the profiler libraries (-L/path/to/profiler-libraries), and the libraries themselves (-lmap-sampler-pmpi, -lmap-sampler).

The MPI wrapper library (-lmap-sampler-pmpi) must go:

  1. After your program's object (.o) files.
  2. After your program's own static libraries, for example -lmylibrary.
  3. After the path to the profiler libraries (-L/path/to/profiler-libraries).
  4. Before the MPI's Fortran wrapper library, if any. For example -lmpichf.
  5. Before the MPI's implementation library usually -lmpi.
  6. Before the Arm sampler library -lmap-sampler.

The sampler library -lmap-sampler must go:

  1. After the Arm MPI wrapper library.
  2. After your program's object (.o) files.
  3. After your program's own static libraries, for example -lmylibrary.
  4. After -Wl,--undefined,allinea_init_sampler_now.
  5. After the path to the profiler libraries -L/path/to/profiler-libraries.
  6. Before -lstdc++, -lgcc_eh, -lrt, -lpthread, -ldl, -lm and -lc.

For example:


   mpicc hello.c -o hello -g -L/users/ddt/allinea \ 
-lmap-sampler-pmpi \
-Wl,--undefined,allinea_init_sampler_now \
-lmap-sampler -lstdc++ -lgcc_eh -lrt \
-Wl,--whole-archive -lpthread \
-Wl,--no-whole-archive \
-Wl,--eh-frame-hdr \
-ldl \
-lm

mpif90 hello.f90 -o hello -g -L/users/ddt/allinea \
-lmap-sampler-pmpi \
-Wl,--undefined,allinea_init_sampler_now \
-lmap-sampler -lstdc++ -lgcc_eh -lrt \
-Wl,--whole-archive -lpthread \
-Wl,--no-whole-archive \
-Wl,--eh-frame-hdr \
-ldl \
-lm

17.2.5 Static linking on Cray X-Series systems

Compiling the MPI Wrapper Library

On Cray X-Series systems use make-profiler-libraries --platform=cray --lib-type=static instead:


   Created the libraries in /home/user/myprogram: 
libmap-sampler.a
libmap-sampler-pmpi.a

To instrument a program, add these compiler options:
compilation for use with MAP - not required for Performance Reports:
-g (or -G2 for native Cray Fortran) (and -O3 etc.)
linking (both MAP and Performance Reports):
-Wl,@/home/user/myprogram/allinea-profiler.ld ... EXISTING_MPI_LIBRARIES
If your link line specifies EXISTING_MPI_LIBRARIES (e.g. -lmpi), then
these must appear *after* the Arm sampler and MPI wrapper libraries in
the link line. There's a comprehensive description of the link ordering
requirements in the 'Preparing a Program for Profiling' section of either
userguide-forge.pdf or userguide-reports.pdf, located in
/opt/arm/forge/doc/.

Linking with the MPI Wrapper Library


   cc hello.c -o hello -g -Wl,@allinea-profiler.ld 

ftn hello.f90 -o hello -g -Wl,@allinea-profiler.ld

17.2.6 Dynamic and static linking on Cray X-Series systems using the modules environment

If your system has the Arm module files installed, you can load them and build your application as usual. See section 17.2.7 .

  1. module load forge or ensure that make-profiler-libraries is in your PATH.
  2. module load map-link-static or module load map-link-dynamic.
  3. Recompile your program.

17.2.7 map-link modules installation on Cray X-Series

To facilitate dynamic and static linking of user programs with the Arm MPI Wrapper and Sampler libraries Cray X-Series System Administrators can integrate the map-link-dynamic and map-link-static modules into their module system. Templates for these modules are supplied as part of the Arm Forge package.

Copy files share/modules/cray/map-link-* into a dedicated directory on the system.

For each of the two module files copied:

  1. Find the line starting with conflict and correct the prefix to refer to the location the module files were installed, for example, arm/map-link-static. The correct prefix depends on the subdirectory (if any) under the module search path the map-link-* modules were installed.
  2. Find the line starting with set MAP_LIBRARIES_DIRECTORY "NONE" and replace "NONE" with a user writable directory accessible from the login and compute nodes.

After installed you can verify whether or not the prefix has been set correctly with 'module avail', the prefix shown by this command for the map-link-* modules should match the prefix set in the 'conflict' line of the module sources.

17.3 Profiling a program

PIC

Figure 94: Run window

If you click the Profile button on the MAP Welcome Page you will see the window above. The settings are grouped into sections. Click the Details… button to expand a section. The settings in each section are described below.

17.3.1 Application

Application: The full path name to your application. If you specified one on the command line, this will already be filled in. You may browse for an application by clicking on the Browse PIC button.

Note

Many MPIs have problems working with directory and program names containing spaces. Arm recommends avoiding the use of spaces in directory and file names.

Arguments: (optional) The arguments passed to your application. These will be automatically filled if you entered some on the command line.

Note

Avoid using quote characters such as ' and ", as these may be interpreted differently by MAP and your command shell. If you must use these and cannot get them to work as expected, please contact Arm support at Arm support.

stdin file: (optional) This allows you to choose a file to be used as the standard input (stdin) for your program. MAP will automatically add arguments to mpirun to ensure your input file is used.

Working Directory: (optional) The working directory to use when running your application. If this is blank then MAP's working directory will be used instead.

17.3.2 Duration

Start profiling after: (optional) This allows you to delay profiling by a number of seconds into the run of your program.

Stop profiling after: (optional) This allows you to specify a number of seconds after which the profiler will terminate your program.

17.3.3 Metrics

This section allows you to explicitly enable and disable metrics for which data is collected. Metrics are listed alphabetically with their display name and unique metric ID under their associated metric group. Select a metric to see a more detailed description, including the metric's default enabled/disabled state.

Only metrics that can be displayed in MAP's metrics view are listed. Metrics that are unlicensed, unsupported or always disabled are not listed. Additionally, you cannot disable metrics that are always enabled.

The initial state of enabled/disabled metrics are the combined settings given by the metric XML defintions, the previous GUI session, and those specified with the --enabled-metrics and --disable-metrics command line options. The command line options take preference over the previous GUI session settings, and both take preference over the metric XML defintions settings. Of course, metrics that are always enabled or always disabled cannot be toggled.

All PAPI metrics will be displayed if installed, and available for enabling/disabling. However, only metrics specified in the PAPI.config file will be affected.

17.3.4 MPI

Note

If you only have a single process license or have selected none as your MPI Implementation the MPI options will be missing. The MPI options are not available when in single process mode. See section 17.5 Profiling a single-process program for more details about using a single process.

Number of processes: The number of processes that you wish to profile. MAP supports hundreds of thousands of processes but this is limited by your license. This option may not be displayed if disabled on the Job Submission options page.

Number of nodes: This is the number of compute nodes that you wish to use to run your program. This option is only displayed for certain MPI implementations or if it is enabled on the Job Submission options page.

Processes per node: This is the number of MPI processes to run on each compute node. This option is only displayed for certain MPI implementations or if it is enabled on the Job Submission options page.

Implementation: The MPI implementation to use, for example, Open MPI, MPICH 2. Normally the Auto setting will detect the currently loaded MPI module correctly. If you are submitting a job to a queue the queue settings will also be summarized here. You may change the MPI implementation by clicking on the Change… button.

Note

The choice of MPI implementation is critical to correctly starting MAP. Your system will normally use one particular MPI implementation. If you are unsure as to which to pick, try generic, consult your system administrator or Arm support. A list of settings for common implementations is provided in E MPI distribution notes and known issues.

Note

If your desired MPI command is not in your PATH, or you wish to use an MPI run command that is not your default one, you can configure this using the Options window. See section A.5.1 System.

mpirun arguments: (optional) The arguments that are passed to mpirun or your equivalent, usually prior to your executable name in normal mpirun usage. You can place machine file arguments, if necessary, here. For most users this box can be left empty.

Note

You should not enter the -np argument as MAP will do this for you.

Profile selected ranks: (optional) If you do not want to profile all the ranks, you can specify a set of ranks to profile. The ranks should be separated by commas and intervals are accepted. Example: 5,6-10.

17.3.5 OpenMP

Number of OpenMP threads: The number of OpenMP threads to run your program with. This ensures the OMP_NUM_THREADS environment variable is set, but your program may override this by calling OpenMP-specific functions.

17.3.6 Environment variables

The optional Environment Variables section should contain additional environment variables that should be passed to mpirun or its equivalent. These environment variables may also be passed to your program, depending on which MPI implementation your system uses. Most users will not need to use this box.

17.3.7 Profiling

Click Run to start your program, or Submit if working through a queue. See section A.2 Integration with queuing systems. This will compile up a MPI wrapper library on the fly that can intercept the MPI_INIT call and gather statistics about MPI use in your program. If this has problems see H.9.3 MPI wrapper libraries. Then MAP brings up the Running window and starts to connect to your processes.

The program runs inside MAP which starts collecting stats on your program through the MPI interface you selected and will allow your MPI implementation to determine which nodes to start which processes on.

MAP collects data for the entire program run by default. Arm's sampling algorithms ensure only a few tens of megabytes are collected even for very long-running jobs. You can stop your program at any time by using the Stop and Analyze button. MAP will then collect the data recorded so far, stop your program and end the MPI session before showing you the results. If any processes remain you may have to clean them up manually using the kill command, or a command provided with your MPI implementation, but this should not be necessary.

PIC

Figure 95: Running window

17.3.8 Profiling only part of a program

You may choose not to start the MAP sampler when the job starts, but instead start it programmatically at a later point. To do this you must set the ALLINEA_SAMPLER_DELAY_START=1 environment variable before starting your program. For MPI programs it is important that this variable is set in the environment of all the MPI processes. It is not necessarily sufficient to simply set the variable in the environment of the MPI command itself. You must arrange for the variable to be set or exported by your MPI command for all the MPI processes.

You may call allinea_start_sampling and allinea_stop_sampling once each. That is to say there must be one and only one contiguous sampling region. It is not possible to start, stop, start, stop. You cannot pause or resume sampling using the allinea_suspend_traces and allinea_resume_traces functions. This will not have the desired effect. You may only delay the start of sampling and stop sampling early.

C

To start sampling programmatically you should #include "mapsampler_api.h" and call the allinea_start_sampling function. You will need to point your C compiler at the MAP include directory, by passing the arguments -I <install root>/map/wrapper and also link with the MAP sampler library, by passing the arguments -L <install root>/lib/64 -lmap-sampler. To stop sampling progammatically call the allinea_stop_sampling function.

Fortran

To start sampling programmatically you should call the ALLINEA_START_SAMPLING subroutine. You will also need to link with the MAP sampler library, for example by passing the arguments -L <install root>/lib/64 -lmap-sampler. To stop sampling programmatically call the ALLINEA_STOP_SAMPLING subroutine.

17.4 remote-exec required by some MPIs

When using SGI MPT, MPICH 1 Standard or the MPMD variants of MPICH 2, MPICH 3 or Intel MPI, MAP will allow mpirun to start all the processes, then attach to them while they're inside MPI_Init. This method is often faster than the generic method, but requires the remote-exec facility in MAP to be correctly configured if processes are being launched on a remote machine. For more information on remote-exec, please see section A.4 Connecting to remote programs (remote-exec). Note: If MAP is running in the background, for example using map &, then this process may get stuck. Some SSH versions cause this behavior when asking for a password. If this happens to you, go to the terminal and use the fg or similar command to make MAP a foreground process, or run MAP again, without using "&". If MAP cannot find a password-free way to access the cluster nodes then you will not be able to use the specialized startup options. Instead, You can use generic, although startup may be slower for large numbers of processes.

17.5 Profiling a single-process program

PIC

Figure 96: Single-Process Run Window
  1. If you have a single-process license you will immediately see the Run Window that is appropriate for single-process applications. If your license supports multiple processes you can simply clear the MPI checkbox to run a single-process program.
  2. Select the application, either by typing the file name in, or selecting it using the browser displayed by clicking the browse PIC button.
  3. Arguments can be typed into the supplied box.
  4. If appropriate, tick the OpenMP box and select the Number of OpenMP threads to start your program with.
  5. Click Run to start your program.

17.6 Sending standard input

MAP provides a stdin file box in the Run window. This allows you to choose a file to be used as the standard input (stdin) for your program. MAP will automatically add arguments to mpirun to ensure your input file is used.

Alternatively, you may enter the arguments directly in the mpirun Arguments box. For example, if using MPI directly from the command-line you would normally use an option to the mpirun such as -stdin filename, then you may add the same options to the mpirun Arguments box when starting your MAP session in the Run window.

It is also possible to enter input during a session. Start your program as normal, then switch to the Input/Output panel. Here you can see the output from your program and type input you wish to send. You may also use the More button to send input from a file, or send an EOF character.

PIC

Figure 97: MAP Sending Input

Note

If MAP is running on a fork-based system such as Scyld, or a -comm=shared compiled MPICH 1, your program may not receive an EOF correctly from the input file. If your program seems to hang while waiting for the last line or byte of input, this is likely to be the problem. See H General troubleshooting and known issues or contact Arm support at Arm support for a list of possible fixes.

17.7 Starting a job in a queue

If MAP has been configured to be integrated with a queue/batch environment, as described in section A.2 Integration with queuing systems then you may use it to launch your job.

In this case, a Submit button is presented on the Run Window, instead of the ordinary Run button. Clicking Submit from the Run Window will display the queue status until your job starts. MAP will execute the display command every second and show you the standard output. If your queue display is graphical or interactive then you cannot use it here.

If your job does not start or you decide not to run it, click on Cancel Job. If the regular expression you entered for getting the job ID is invalid or if an error is reported then MAP will not be able to remove your job from the queue.

It is strongly recommended you check the job has been removed before submitting another as it is possible for a forgotten job to execute on the cluster and either waste resources or interfere with other profiling sessions.

After the sampling (program run) phase is complete, MAP will start the analysis phase, collecting and processing the distinct samples. This could be a lengthy process depending on the size of the program. For very large programs it could be as much as 10 or 20 minutes.

You should ensure that your job does not hit its queue limits during the analysis process, setting the job time large enough to cover both the sampling and the analysis phases.

MAP will also require extra memory both in the sampling and in the analysis phases. If these fail and your application alone approaches one of these limits then you may need to run with fewer processes per node or a smaller data set in order to generate a complete set of data.

Once your job is running, it will connect to MAP and you will be able to profile it.

17.8 Using custom MPI scripts

On some systems a custom mpirun replacement is used to start jobs, such as mpiexec. MAP will normally use whatever the default for your MPI implementation is, so for MPICH 1 it would look for mpirun and not mpiexec, for SLURM it would use srun and so on. This section explains how to configure MAP to use a custom mpirun command for job start up.

There are typically two ways you might want to start jobs using a custom script, and MAP supports them both.

The first way is to pass all the arguments on the command-line, as in the following example:


   mpiexec -n 4 /home/mark/program/chains.exe /tmp/mydata

There are several key variables in this line that MAP can fill in for you:

  1. The number of processes (4 in the above example).
  2. The name of your program (/home/mark/program/chains.exe).
  3. One or more arguments passed to your program (/tmp/mydata).

Everything else, like the name of the command and the format of its arguments remains constant.

To use a command like this in MAP, the queue submission system is adpated as described in the previous section. For this mpiexec example, the settings would be as shown here:

PIC

Figure 98: MAP Using Custom MPI Scripts

As you can see, most of the settings are left blank.

There are some differences between the Submit Command in MAP and what you would type at the command-line:

  1. The number of processes is replaced with NUM_PROCS_TAG.
  2. The name of the program is replaced by the full path to ddt-debugger, used by both DDT and MAP.
  3. The program arguments are replaced by PROGRAM_ARGUMENTS_TAG.

Note

it is not necessary to specify the program name here. MAP takes care of that during its own startup process. The important thing is to make sure your MPI implementation starts ddt-debugger instead of your program, but with the same options.

The second way you might start a job using a custom mpirun replacement is with a settings file:


   mpiexec -config /home/mark/myapp.nodespec

Where myfile.nodespec might contains something like the following:


   comp00 comp01 comp02 comp03 : /home/mark/program/chains.exe /tmp/mydata

MAP can automatically generate simple configuration files like this every time you run your program if you specify a template file. For the above example, the template file myfile.template would contain the following:


   comp00 comp01 comp02 comp03 : DDTPATH_TAG/bin/ddt-debugger DDT_DEBUGGER_ARGUMENTS_TAG PROGRAM_ARGUMENTS_TAG

This follows the same replacement rules described above and in detail in section A.2 Integration with queuing systems.

The options settings for this example might be:

PIC

Figure 99: MAP Using Substitute MPI Commands

Note the Submit Command and the Submission Template File in particular. MAP will create a new file and append it to the submit command before executing it. So, in this case what would actually be executed might be mpiexec -config /tmp/allinea-temp-0112 or similar. Therefore, any argument like -config must be last on the line, because MAP will add a file name to the end of the line. Other arguments, if there are any, can come first.

Arm recommends reading the section on queue submission, as there are many features described there that might be useful to you if your system uses a non-standard start up command.

If you do use a non-standard command, please contact Arm at Arm support.

17.9 Starting MAP from a job script

While it is common when debugging to submit runs from inside a debugger, for profiling the usual approach would be to run the program offline, producing a profile file that can be inspected later.

To do this replace your usual program invocation with a MAP command such as:


   mpirun -n 4 PROGRAM [ARGUMENTS]...

With either of the following examples:


   map --profile mpirun -n 4 PROGRAM [ARGUMENTS]...

   map --profile --np=4 PROGRAM [ARGUMENTS]...

MAP will run without a GUI, gathering data to a .map profile file. Its filename is based on a combination of program name, process count and timestamp, like program_2p_2012-12-19_10-51.map.

If using OpenMP, the value of OMP_NUM_THREADS is also included in the name after the process count, like program_2p_8t_2014-10-21_12-45.map.

This default name may be changed with the --output argument. To examine this file, either run MAP and select the Load Profile Data File option, or access it directly with the command:


   map program_2p_2012-12-19_10-51.map

Note

When starting MAP for examining an existing profile file, a valid license is not needed.

When running without a GUI, MAP prints a short header and footer to stderr with your program's output in between. The --silent argument suppresses this additional output so that your program's output is intact.

As an alternative to --profile you can use Reverse Connect (see 3.3 Reverse Connect) to connect back to the GUI if you wish to use interactive profiling from inside the queue. So the above example becomes either:


   map --connect mpirun -n 4 PROGRAM [ARGUMENTS]...

Or:


   map --connect --np=4 PROGRAM [ARGUMENTS]...

17.10 Numactl

MAP supports launching programs via numactl for MPI programs. It works with or without SLURM. The recommended way to launch via numactl is to use express launch mode.


   map mpiexec -n 4 numactl -m 1 ./myprogram.exe 
map srun -n 4 numactl -m 1 ./myprogram.exe

It is also possible to launch via numactl using compatibility mode. If using compatibility mode, you need to put the full path to numactl in the Application box. If you do not know the full path to numactl, you can find it by running:


   which numactl

Enter the name of the required application in the Arguments field, after all arguments to be passed to numactl. It is not possible to pass any more arguments to the parallel job runner when using this mode for launching.

17.11 MAP environment variables

ALLINEA_SAMPLER_INTERVAL

MAP takes a sample in each 20ms period, giving it a default sampling rate of 50Hz. This will be automatically decreased as the run proceeds to ensure a constant number of samples are taken. See ALLINEA_SAMPLER_NUM_SAMPLES.

If your program runs for a very short period of time, you may benefit by decreasing the initial sampling interval. For example, ALLINEA_SAMPLER_INTERVAL=1 sets an initial sampling rate of 1000Hz, or once per millisecond. Higher sampling rates are not supported.

Increasing the sampling frequency from the default is not recommended if there are lots of threads and/or very deep stacks in the target program as this may not leave sufficient time to complete one sample before the next sample is started.

Note: Custom values for ALLINEA_SAMPLER_INTERVAL may be overwritten by values set from the combination of ALLINEA_SAMPLER_INTERVAL_PER_THREAD and the expected number of threads (from OMP_NUM_THREADS). For more information, see ALLINEA_SAMPLER_INTERVAL_PER_THREAD.

ALLINEA_SAMPLER_INTERVAL_PER_THREAD

To keep overhead low, MAP imposes a minimum sampling interval based on the number of threads. By default this is 2ms per thread, thus for eleven or more threads MAP will increase the initial sampling interval to more than 20ms.

You can adjust this behavior by setting ALLINEA_SAMPLER_INTERVAL_PER_THREAD to the minimum per-thread sample time in milliseconds.

Lowering this value from the default is not recommended if there are lots of threads as this may not leave sufficient time to complete one sample before the next sample is started.

Note: Whether OpenMP is enabled or disabled in MAP, the final script or scheduler values set for OMP_NUM_THREADS will be used to calculate the sampling interval per thread (ALLINEA_SAMPLER_INTERVAL_PER_THREAD). When configuring your job for submission, check whether your final submission script, scheduler or the MAP GUI has a default value for OMP_NUM_THREADS.

Note: Custom values for ALLINEA_SAMPLER_INTERVAL will be overwritten by values set from the combination of ALLINEA_SAMPLER_INTERVAL_PER_THREAD and the expected number of threads (from OMP_NUM_THREADS).

ALLINEA_MPI_WRAPPER

To direct MAP to use a specific wrapper library set ALLINEA_MPI_WRAPPER=<pathofsharedobject>.

MAP ships with a number of precompiled wrappers, when your MPI is supported MAP will automatically select and use the appropriate wrapper.

To manually compile a wrapper specifically for your system, set ALLINEA_WRAPPER_COMPILE=1 and MPICC and run <path to MAP installation>/map/wrapper/build_wrapper.

This will generate the wrapper library ~/.allinea/wrapper/libmap-sampler-pmpi-<hostname>.so with symlinks to the following files:

  • ~/.allinea/wrapper/libmap-sampler-pmpi-<hostname>.so.1
  • ~/.allinea/wrapper/libmap-sampler-pmpi-<hostname>.so.1.0
  • ~/.allinea/wrapper/libmap-sampler-pmpi-<hostname>.so.1.0.0.

ALLINEA_WRAPPER_COMPILE

To direct MAP to fall back to creating and compiling a just-in-time wrapper, set ALLINEA_WRAPPER_COMPILE=1.

In order to be able to generate a just-in-time wrapper an appropriate compiler must be available on the machine where MAP is running, or on the remote host when using remote connect.

MAP will attempt to auto detect your MPI compiler, however, setting the MPICC environment variable to the path to the correct compiler is recommended.

ALLINEA_MPIRUN

The path of mpirun, mpiexec or equivalent.

If this is set it has higher priority than that set in the GUI and the mpirun found in PATH.

ALLINEA_SAMPLER_NUM_SAMPLES

MAP collects 1000 samples per process by default. To avoid generating too much data on long runs, the sampling rate will be automatically decreased as the run progresses to ensure only 1000 evenly spaced samples are stored.

You may adjust this by setting ALLINEA_SAMPLER_NUM_SAMPLES=<positiveinteger>.

Note

It is strongly recommended that you leave this value at the default setting. Higher values are not generally beneficial and add extra memory overheads while running your code. Bear in mind that with 512 processes, the default setting already collects half a million samples over the job, the effective sampling rate can be very high indeed.

ALLINEA_KEEP_OUTPUT_LINES

Specifies the number of lines of program output to record in .map files. Setting to 0 will remove the line limit restriction, although this is not recommended as it may result in very large .map files if the profiled program produces lots of output.

See 18.3 Restricting output.

ALLINEA_KEEP_OUTPUT_LINE_LENGTH

The maximum line length for program output that will be recorded in .map files - lines containing more characters than this limit will be truncated. Setting to 0 will remove the line length restriction, although this is not recommended as it may result in very large .map files if the profiled program produces lots of output per line.

See 18.3 Restricting output.

ALLINEA_PRESERVE_WRAPPER

To gather data from MPI calls MAP generates a wrapper to the chosen MPI implementation. See 17.2 Preparing a program for profiling.

By default the generated code and shared objects are deleted when MAP no longer needs them.

To prevent MAP from deleting these files set ALLINEA_PRESERVE_WRAPPER=1.

Please note that if you are using remote launch then this variable must be exported in the remote script. See 3.2.1 Remote script.

ALLINEA_SAMPLER_NO_TIME_MPI_CALLS

Set this to prevent MAP from timing the time spent in MPI calls.

ALLINEA_SAMPLER_TRY_USE_SMAPS

Set this to allow MAP to use /proc/[pid]/smaps to gather memory usage data. This is not recommended since it slows down sampling significantly.

MPICC

To create the MPI wrapper MAP will try to use MPICC, then if that fails search for a suitable MPI compiler command in PATH. If the MPI compiler used to compile the target binary is not in PATH (or if there are multiple MPI compilers in PATH) then MPICC should be set.

Was this page helpful? Yes No