You copied the Doc URL to your clipboard.

Index

Arm IPMI Energy Agent, 1
Requirements, 2
MAP file, 3
Performance Reports
Specific issues, 4

, 5

Accelerator breakdown, 6
Global memory accesses, 7
GPU utilization, 8
Mean GPU memory usage, 9
Peak GPU memory usage, 10
AMD
OpenCL, 11
Arm, 12, 13
Known issues, 14

Bull MPI, 15

Compatibility Launch, 16
Compute node access, 17
Configuration, 18
CPU breakdown, 19
Memory accesses, 20
OpenMP code, 21
Scalar numeric ops, 22
Single core code, 23
Vector numeric ops, 24
Waiting for accelerators, 25
CPU metrics breakdown, 26
Cycles per instruction, 27
FLOPS scalar lower bound, 28
FLOPS vector lower bound, 29
L2 cache misses, 30
Memory accesses, 31
Mispredicted branch instructions, 32
Stalled cycles, 33
Cray Compiler Environment, 34
Cray MPT, 35
Cray Native SLURM, 36
Cray X, 37
Cray X-Series, 38, 39, 40, 41, 42, 43
CSV performance reports, 44
Custom DCIM, 45
Custom gmetric, 46

DCIM output, 47
Dynamic linking
Cray X-Series, 48

Enable and disable metrics, 49
Energy breakdown, 50
CPU, 51
Mean node power, 52
Peak node power, 53
System, 54
Energy metrics
Requirements, 55
Environment variables, 56
Example, 57
Compiling, 58
Cray, 59
Generating a performance report, 60
Overview, 61
Running, 62
Express Launch, 63
Compatible MPIs, 64

General Troubleshooting, 65
Generating a report, 66
Getting Support, 67
GNU Compiler, 68

HTML reports, 69

I/O breakdown, 70
Effective process read rate, 71
Effective process write rate, 72
Lustre metrics, 73
Time in reads, 74
Time in writes, 75
Installation, 76
Linux, 77
Graphical install, 78
Text-mode install, 79
Intel Compiler, 80
Intel MPI, 81
Intel Xeon, 82
RAPL, 83
Interpreting, 84
Introduction, 85
IPMI, 86

Known issues
Performance Reports, 87
Incorrect MPI time, 88
Insufficient samples, 89
MPI wrapper libraries, 90
No thread activity while blocking on an MPI call, 91
Not correctly identifying vectorized instructions, 92
OpenBLAS application, 93
Reporting time spent in a function definition, 94
Thread support limitations, 95
Compiler, 96
General, 97
No shared home directory, 98
Problems starting multi-process programs, 99
Starting a program, 100
Starting scalar programs, 101

Licensing
Architecture licensing, 102
License files, 103
Supercomputing and other floating licenses, 104
Using multiple architecture licenses, 105
Workstation and evaluation licenses, 106
Linking, 107
Dynamic
On Cray X-Series using modules environment, 108
Static, 109
On Cray X-Series using modules environment, 110
Log file, 111

map-link modules, 112
Installation
Cray X-Series, 113
Memory breakdown, 114
Mean process memory usage, 115
Peak node memory usage, 116
Peak process memory usage, 117
Metrics
Accelerator breakdown, 118
Computation, 119, 120
Compute, 121
CPU breakdown, 122
CPU metrics breakdown, 123
Cycles per instruction, 124
Effective process collective rate, 125
Effective process point-to-point rate, 126
Effective process read rate, 127
Effective process write rate, 128
Energy
Accelerator, 129
CPU, 130
Mean node power, 131
Peak node power, 132
System, 133
Energy breakdown, 134
FLOPS scalar lower bound, 135
FLOPS vector lower bound, 136
Global memory accesses, 137
GPU Utilization, 138
I/O breakdown, 139
Input/Output, 140
L2 cache misses, 141
Lustre metrics, 142
Mean GPU memory usage, 143
Mean process memory usage, 144
Memory accesses, 145, 146
Memory breakdown, 147
Mispredicted branch instructions, 148
MPI, 149
MPI breakdown, 150
OpenMP breakdown, 151
OpenMP code, 152
Peak GPU memory usage, 153
Peak node memory usage, 154
Peak process memory usage, 155
Physical core utilization, 156, 157
Scalar numeric ops, 158
Single core code, 159
Stalled cycles, 160
Synchronization, 161, 162
System load, 163, 164
Threads breakdown, 165
Time in collective calls, 166
Time in point-to-point calls, 167
Time in reads, 168
Time in writes, 169
Vector numeric ops, 170
Waiting for accelerators, 171
MPI
Troubleshooting, 172
MPI breakdown, 173
Effective process collective rate, 174
Effective process point-to-point rate, 175
Time in collective calls, 176
Time in point-to-point calls, 177
MPI wrapper libraries, 178
MPICH 2, 179
MPICH 3, 180

NVIDIA CUDA, 181

Obtaining support, 182
Online resources, 183
Open MPI, 184
OpenMP breakdown, 185
Computation, 186
Physical core utilization, 187
Synchronization, 188
System load, 189
Output locations, 190

Performance reports
Energy breakdown
Accelerator, 191
Threads breakdown
Synchronization, 192
Platform MPI, 193
Portland Group Compiler, 194
POWER8 and POWER9
Known issues, 195
Profiling
Preparing a program, 196

Report summary, 197
Compute, 198
Input/Output, 199
MPI, 200
Requirements
Energy metrics, 201
Running, 202

SGI, 203
SLURM, 204
Static linking, 205
On Cray X-Series, 206
Supported Platforms, 207

Textual performance reports, 208
Thread support limitations, 209
Threads breakdown, 210
Computation, 211
Physical core utilization, 212
System load, 213

Unified Parallel C, 214, 215
UPC
Berkeley, 216
GNU, 217

Worked examples, 218
Code characterization and run size comparison, 219
Deeper CPU metric analysis, 220
I/O performance bottlenecks, 221

Was this page helpful? Yes No