The Stacks view offers a good top-down view of your program. It is easy to follow down from the main function to see which code paths took the most time. Each line of the Stacks view shows the performance of one line of your source code, including all the functions called by that line.
The sparkline graphs are described in detail in section 18 .
You can read the above figure as follows:
- The first line, slow, represents the entire program run. Collapsing this node The first line, program slow, represents the entire program.
- Beneath it, you see a call to the stride function, almost all of which was in single-threaded compute (dark green). 1.4% of the time was spent in MPI.
- The next major function called from program slow is the overlap function, seen at the bottom of this figure. A more detailed breakdown is described in section 24 . The stride function itself spent most of that time on the line a(i,j)=x*j at slow.f90 line 107. In fact, 43.2% of the entire run was spent executing this line of code.
- The 1% MPI time inside stride comes from an MPI_Barrier on line 124.
- The next major function called from program slow is the overlap function, seen at the bottom of this figure. This function ran for 24.8% of the total time, almost all of which was runtime. This line of code was executed at the start of the overlap function, and other calls which are not visible in the figure accounted for the rest.
Clicking on any line of the Stacks view jumps the Source Code view to show that line of code. This makes it a very easy way to navigate and understand the performance of even complex codes.
The percentage MPI time gives an idea as to how well your program is scaling and shows the location of any communication bottlenecks. As you discussed in section 18 , any sloping blue edges represent imbalance between processes or cores.
In the above example you can see that the MPI_Send call inside the overlap function has a sloping trailing edge. This means that some processes took significantly longer to finish the call than others, perhaps because they were waiting longer for their receiver to become ready.
Stacks view shows which lines of code spend the most time running, computing or waiting. As with most places in the GUI you can hover over a line or chart for a more detailed breakdown.