You copied the Doc URL to your clipboard.


1 Introduction
1.1 Online resources
2 Installation
2.1 Linux installation
2.1.1 Graphical install
2.1.2 Text-mode install
2.2 License files
2.3 Workstation and evaluation licenses
2.4 Supercomputing and other floating licenses
2.5 Architecture licensing
2.5.1 Using multiple architecture licenses
2.6 Environment variables
2.6.1 Report customization
2.6.2 Warning suppression
2.6.3 I/O behavior
2.6.4 Licensing
2.6.5 Timeouts
2.6.6 Sampler
2.6.7 Simple troubleshooting
3 Running with an example program
3.1 Overview of the example source code
3.2 Compiling
3.2.1 Cray X-series
3.3 Running
3.4 Generating a performance report
4 Running with real programs
4.1 Preparing a program for profiling
4.1.1 Linking
4.1.2 Dynamic linking on Cray X-Series systems
4.1.3 Static linking
4.1.4 Static linking on Cray X-Series systems
4.1.5 Dynamic and static linking on Cray X-Series systems using the modules environment
4.1.6 map-link modules installation on Cray X-Series
4.2 Express Launch mode
4.2.1 Compatible MPIs
4.3 Compatibility Launch mode
4.4 Generating a performance report
4.5 Specifying output locations
4.6 Support for DCIM systems
4.6.1 Customizing your DCIM script
4.6.2 Customising the gmetric location
4.7 Enable and disable metrics
5 Summarizing an existing MAP file
6 Interpreting performance reports
6.1 HTML performance reports
6.2 Report summary
6.2.1 Compute
6.2.2 MPI
6.2.3 Input/Output
6.3 CPU breakdown
6.3.1 Single core code
6.3.2 OpenMP code
6.3.3 Scalar numeric ops
6.3.4 Vector numeric ops
6.3.5 Memory accesses
6.3.6 Waiting for accelerators
6.4 CPU metrics breakdown
6.4.1 Cycles per instruction
6.4.2 Stalled cycles
6.4.3 L2 cache misses
6.4.4 Mispredicted branch instructions
6.4.5 FLOPS scalar lower bound
6.4.6 FLOPS vector lower bound
6.4.7 Memory accesses
6.5 OpenMP breakdown
6.5.1 Computation
6.5.2 Synchronization
6.5.3 Physical core utilization
6.5.4 System load
6.6 Threads breakdown
6.6.1 Computation
6.6.2 Synchronization
6.6.3 Physical core utilization
6.6.4 System load
6.7 MPI breakdown
6.7.1 Time in collective calls
6.7.2 Time in point-to-point calls
6.7.3 Effective process collective rate
6.7.4 Effective process point-to-point rate
6.8 I/O breakdown
6.8.1 Time in reads
6.8.2 Time in writes
6.8.3 Effective process read rate
6.8.4 Effective process write rate
6.8.5 Lustre metrics
6.9 Memory breakdown
6.9.1 Mean process memory usage
6.9.2 Peak process memory usage
6.9.3 Peak node memory usage
6.10 Accelerator breakdown
6.10.1 GPU utilization
6.10.2 Global memory accesses
6.10.3 Mean GPU memory usage
6.10.4 Peak GPU memory usage
6.11 Energy breakdown
6.11.1 CPU
6.11.2 Accelerator
6.11.3 System
6.11.4 Mean node power
6.11.5 Peak node power
6.11.6 Requirements
6.12 Textual performance reports
6.13 CSV performance reports
6.14 Worked examples
6.14.1 Code characterization and run size comparison
6.14.2 Deeper CPU metric analysis
6.14.3 I/O performance bottlenecks
7 Configuration
7.1 Compute node access
A Getting support
B Supported platforms
B.1 Performance Reports
C Known issues
D MPI distribution notes
D.1 Bull MPI
D.2 Cray MPT
D.3 Intel MPI
D.6 Open MPI
D.7 Platform MPI
D.8 SGI MPT / SGI Altix
E Compiler notes
E.1 AMD OpenCL compiler
E.2 Berkeley UPC compiler
E.3 Cray compiler environment
E.5 Intel compilers
E.6 Portland Group compilers
F Platform notes
F.1 Intel Xeon
F.1.1 Enabling RAPL energy and power counters when profiling
F.3 Arm
F.3.1 Arm®;v8 (AArch64) known issues
F.4 POWER8 and POWER9 (POWER 64-bit) known issues
G General troubleshooting
G.1 Starting a program
G.1.1 Problems starting scalar programs
G.1.2 Problems starting multi-process programs
G.1.3 No shared home directory
G.2 Performance Reports specific issues
G.2.1 MPI wrapper libraries
G.2.2 Thread support limitations
G.2.3 No thread activity while blocking on an MPI call
G.2.4 I'm not getting enough samples
G.2.5 Performance Reports is reporting time spent in a function definition
G.2.6 Performance Reports is not correctly identifying vectorized instructions
G.2.7 Performance Reports takes a long time to gather and analyze my OpenBLAS-linked application
G.2.8 Performance Reports over-reports MPI, I/O, accelerator or synchronization time
G.3 Obtaining support
G.4 Arm IPMI Energy Agent
G.4.1 Requirements
Was this page helpful? Yes No