Arm Forge 19.0 adds the Python profiling capabilities you need to find and resolve bottlenecks for your Python codes.
For the latest information about Python profiling in MAP, see the Python Profiling web page.
This task describes how to profile a Python script. This feature is useful when profiling a mixed C, C++, Fortran, and Python program.
Python profiling replaces main thread stack frames originating from the Python interpreter with Python stack frames of the profiled Python script. To disable this feature, set ALLINEA_SAMPLER_DISABLE_PYTHON_PROFILING=1.
Python profiling in MAP has the following limited support:
- Profiling Python scripts running under the CPython interpreter (version 2.7, 3.3+).
- Profiling Python scripts running under the Intel Distribution for Python.
- Profiling Python scripts running under the Anaconda Python distribution (version 3.6 not supported).
- Profiling Python scripts running under virtual enviornments.
- Profiling Python scripts that import modules which perform MPI on the main thread, such as mpi4py.
- Profiling Python scripts that import modules which use OpenMP.
- Profiling Python scripts that make use of the threading module.
MAP will output warnings if the threading model of the MPI module is MPI_THREAD_MULTIPLE, such as in mpi4py. To prevent these warnings, change the default settings in mpi4py with the following: mpi4py.rc.threaded = False or mpi4py.rc.thread_level = "funneled".
If you are profiling on a system using ALPS or SLURM and the Python script does not use MPI, environment variables (section H.2 ) must be set.
- Check that the Python script runs successfully:
- To profile the Python script with MAP, prepend the run
command with map:
- Click Run and wait for MAP to finish profiling the Python script.
- View the profiling results in MAP.
This section demonstrates how to profile the Python example script python-profiling.py located in the examples directory.
- Change into the examples directory and run the makefile to compile the example.
- Start MAP
- Click Run.
- Wait for MAP to finish analyzing samples after the
Python script has completed.
Note: The MAP GUI launches showing the Python script and the line in the script where the most time was spent is selected.
- Locate the first fibonacci_c stack frame in the Main Thread Stacks view. The callout to the C function is appended under main Python stack frame.
- Examine the Main Thread Activity graph (section 24 ) for an overview of time spent in Python code compared with non-Python code.
- View source code lines (section 18.1 ) on which time was spent executing Python code and non-Python code.
- Compare time spent on the selected line executing Python code with non-Python code in the Selected Lines View (section 19 ).
- View a breakdown of time spent in different code paths in the Main Thread Stacks view (section 20 ).
- For more information on using MAP, see section 16 .
- For information on debugging Python scripts with DDT, see section 5.16 .
- MAP requires a significant amount of time to analyze samples when profiling a Python script that imports modules which use OpenBLAS, such as NumPy. This is caused by the lack of unwind information in OpenBLAS. This results in partial trace nodes being displayed in MAP.
- mpi4py uses some MPI functions that were introduced in MPI version 3. For example MPI_Mrecv. MAP does not collect metrics from these functions, therefore MPI metrics for mpi4py will be inaccurate. To workaround this, use a custom Python MPI wrapper that only uses functions that were available before MPI version 3.
- When using reverse connect (--connect) and quick start (--start) in conjunction, the full path to the Python application must be provided.
- The Anaconda Python 3.6.x interpreter is aggressively optimized. This causes multiple startup issues when profiling with MAP and is not supported.