You copied the Doc URL to your clipboard.

Index

Arm IPMI Energy Agent, 1
Requirements, 2
MAP file, 3
Performance Reports
Specific issues, 4

, 5

Accelerator breakdown, 6
Global memory accesses, 7
GPU utilization, 8
Mean GPU memory usage, 9
Peak GPU memory usage, 10
AMD
OpenCL, 11
Arm, 12
Known issues, 13

Bull MPI, 14

Compatibility Launch, 15
Compute node access, 16
Configuration, 17
CPU breakdown, 18
Memory accesses, 19
OpenMP code, 20
Scalar numeric ops, 21
Single core code, 22
Vector numeric ops, 23
Waiting for accelerators, 24
CPU metrics breakdown, 25
Cycles per instruction, 26
L2 cache misses, 27
Mispredicted branch instructions, 28
Stalled cycles, 29
Cray Compiler Environment, 30
Cray MPT, 31
Cray Native SLURM, 32
Cray X, 33
Cray X-Series, 34, 35, 36, 37, 38, 39
CSV performance reports, 40
Custom DCIM, 41
Custom gmetric, 42

DCIM output, 43
Dynamic linking
Cray X-Series, 44

Enable and disable metrics, 45
Energy breakdown, 46
CPU, 47
Mean node power, 48
Peak node power, 49
System, 50
Energy metrics
Requirements, 51
Example, 52
Compiling, 53
Cray, 54
Generating a performance report, 55
Overview, 56
Running, 57
Express Launch, 58
Compatible MPIs, 59

General Troubleshooting, 60
Generating a report, 61
Getting Support, 62
GNU Compiler, 63

HTML reports, 64

I/O breakdown, 65
Effective process read rate, 66
Effective process write rate, 67
Time in reads, 68
Time in writes, 69
Installation, 70
Linux, 71
Graphical install, 72
Text-mode install, 73
Intel Compiler, 74
Intel MPI, 75
Intel Xeon, 76
RAPL, 77
Interpreting, 78
Introduction, 79
IPMI, 80

Known issues
Performance Reports, 81
Compiler inlining functions, 82
Incorrect MPI time, 83
Insufficient samples, 84
MPI wrapper libraries, 85
No thread activity while blocking on an MPI call, 86
Not correctly identifying vectorized instructions, 87
OpenBLAS application, 88
Reporting time spent in a function definition, 89
Tail Call, 90
Thread support limitations, 91
Compiler, 92
General, 93
No shared home directory, 94
Problems starting multi-process programs, 95
Starting a program, 96
Starting scalar programs, 97

Licensing
Architecture licensing, 98
License files, 99
Supercomputing and other floating licenses, 100
Using multiple architecture licenses, 101
Workstation and evaluation licenses, 102
Linking, 103
Dynamic
On Cray X-Series using modules environment, 104
Static, 105
On Cray X-Series using modules environment, 106
Log file, 107

Map-link modules, 108
Installation
Cray X-Series, 109
Memory breakdown, 110
Mean process memory usage, 111
Peak node memory usage, 112
Peak process memory usage, 113
Metrics
Accelerator breakdown, 114
Computation, 115, 116
Compute, 117
CPU breakdown, 118
CPU metrics breakdown, 119
Cycles per instruction, 120
Effective process collective rate, 121
Effective process point-to-point rate, 122
Effective process read rate, 123
Effective process write rate, 124
Energy
Accelerator, 125
CPU, 126
Mean node power, 127
Peak node power, 128
System, 129
Energy breakdown, 130
Global memory accesses, 131
GPU Utilization, 132
I/O breakdown, 133
Input/Output, 134
L2 cache misses, 135
Mean GPU memory usage, 136
Mean process memory usage, 137
Memory accesses, 138
Memory breakdown, 139
Mispredicted branch instructions, 140
MPI, 141
MPI breakdown, 142
OpenMP breakdown, 143
OpenMP code, 144
Peak GPU memory usage, 145
Peak node memory usage, 146
Peak process memory usage, 147
Physical core utilization, 148, 149
Scalar numeric ops, 150
Single core code, 151
Stalled cycles, 152
Synchronization, 153, 154
System load, 155, 156
Threads breakdown, 157
Time in collective calls, 158
Time in point-to-point calls, 159
Time in reads, 160
Time in writes, 161
Vector numeric ops, 162
Waiting for accelerators, 163
MPI
Troubleshooting, 164
MPI breakdown, 165
Effective process collective rate, 166
Effective process point-to-point rate, 167
Time in collective calls, 168
Time in point-to-point calls, 169
MPI wrapper libraries, 170
MPICH 2, 171
MPICH 3, 172

NVIDIA CUDA, 173

Obtaining support, 174
Online resources, 175
Open MPI, 176
OpenMP breakdown, 177
Computation, 178
Physical core utilization, 179
Synchronization, 180
System load, 181
Output locations, 182

Performance reports
Energy breakdown
Accelerator, 183
Threads breakdown
Synchronization, 184
Platform MPI, 185
Portland Group Compiler, 186
Profiling
Preparing a program, 187

Report summary, 188
Compute, 189
Input/Output, 190
MPI, 191
Requirements
Energy metrics, 192
Running, 193

SGI, 194
SLURM, 195
Spectrum MPI, 196
Static linking, 197
On Cray X-Series, 198
Supported Platforms, 199

Textual performance reports, 200
Thread support limitations, 201
Threads breakdown, 202
Computation, 203
Physical core utilization, 204
System load, 205

Unified Parallel C, 206, 207
UPC
Berkeley, 208
GNU, 209

Worked examples, 210
Code characterization and run size comparison, 211
Deeper CPU metric analysis, 212
I/O performance bottlenecks, 213

Was this page helpful? Yes No