This chapter covers the steps required to profile a Python script. This feature is useful when profiling a mixed C, C++, Fortran and Python program.
Python profiling replaces main thread stack frames originating from the Python interpreter with Python stack frames of the profiled Python script. Set ALLINEA_SAMPLER_DISABLE_PYTHON_PROFILING=1 to disable this feature.
Python profiling in MAP has the following limited support:
- Profiling Python scripts running under the CPython interpreter (version 2.7, 3.5 and 3.6 only).
- Profiling Python scripts running under the Intel Distribution for Python.
- Profiling Python scripts running under the Anaconda Python distribution.
- Profiling Python scripts running under virtual enviornments.
- Profiling Python scripts that import modules which perform MPI on the main thread, such as mpi4py.
- Profiling Python scripts that import modules which use OpenMP. Only Python scripts running on the main thread of the Python interpreter are sampled by MAP.
MAP will output warnings if the threading model of the MPI module is MPI_THREAD_MULTIPLE, such as in mpi4py. To prevent these warnings, change the default settings in mpi4py with the following: mpi4py.rc.threaded = False or mpi4py.rc.thread_level = "funneled".
Environment variables (section H.2 ) must be set if you are profiling on a system using ALPS or SLURM and the Python script does not use MPI.
- Check that the Python script runs successfully:
- To profile the Python script with MAP, prepend the run
command with map:
- Click Run and wait for MAP to finish profiling the Python script.
- View the profiling results in MAP.
This section demonstrates how to profile the Python example script python-profiling.py located in the examples directory.
- Change into the examples directory and run the makefile to compile the example.
- Start MAP
- Click Run.
- Wait for MAP to finish analyzing samples after the
Python script has completed.
Note: The MAP GUI launches showing the Python script and the line in the script where the most time was spent is selected.
- Locate the first fibonacci_c stack frame in the Main Thread Stacks view. The callout to the C function is appended under main Python stack frame.
- Examine the Main Thread Activity graph (section 24 ) for an overview of time spent in Python code compared with non-Python code.
- View source code lines (section 18.1 ) on which time was spent executing Python code and non-Python code.
- Compare time spent on the selected line executing Python code with non-Python code in the Selected Lines View (section 19 ).
- View a breakdown of time spent in
different code paths in the Main Thread Stacks view
Note: Python stack frames are only displayed for the main thread of the Python interpreter. Normal stack frames are displayed for non-main threads of the Python interpreter.
- For more information on using MAP, see section 16 .
- For information on debugging Python scripts with DDT, see section 5.16 .
- MAP requires a significant amount of time to analyze samples when profiling a Python script that imports modules which use OpenBLAS, such as NumPy. This is caused by the lack of unwind information in OpenBLAS. This results in partial trace nodes being displayed in MAP.
- mpi4py uses some MPI functions that were introduced in MPI version 3. For example MPI_Mrecv. MAP does not collect metrics from these functions, therefore MPI metrics for mpi4py will be inaccurate. To workaround this, use a custom Python MPI wrapper that only uses functions that were available before MPI version 3.
- When using reverse connect (--connect) and quick start (--start) in conjunction, the full path to the Python application must be provided.