You copied the Doc URL to your clipboard.

Index

Arm IPMI Energy Agent, 1
Requirements, 2
MAP file, 3
Performance Reports
Specific issues, 4

, 5

Accelerator breakdown, 6
Global memory accesses, 7
GPU utilization, 8
Mean GPU memory usage, 9
Peak GPU memory usage, 10
AMD
OpenCL, 11
Arm, 12
Known issues, 13

Bull MPI, 14

Compatibility Launch, 15
Compute node access, 16
Configuration, 17
CPU breakdown, 18
Memory accesses, 19
OpenMP code, 20
Scalar numeric ops, 21
Single core code, 22
Vector numeric ops, 23
Waiting for accelerators, 24
CPU metrics breakdown, 25
Cycles per instruction, 26
L2 cache misses, 27
Mispredicted branch instructions, 28
Stalled cycles, 29
Cray Compiler Environment, 30
Cray MPT, 31
Cray Native SLURM, 32
Cray X, 33
Cray X-Series, 34, 35, 36, 37, 38, 39
CSV performance reports, 40
Custom DCIM, 41
Custom gmetric, 42

DCIM output, 43
Dynamic linking
Cray X-Series, 44

Enable and disable metrics, 45
Energy breakdown, 46
CPU, 47
Mean node power, 48
Peak node power, 49
System, 50
Energy metrics
Requirements, 51
Environment variables, 52
Example, 53
Compiling, 54
Cray, 55
Generating a performance report, 56
Overview, 57
Running, 58
Express Launch, 59
Compatible MPIs, 60

General Troubleshooting, 61
Generating a report, 62
Getting Support, 63
GNU Compiler, 64

HTML reports, 65

I/O breakdown, 66
Effective process read rate, 67
Effective process write rate, 68
Lustre metrics, 69
Time in reads, 70
Time in writes, 71
Installation, 72
Linux, 73
Graphical install, 74
Text-mode install, 75
Intel Compiler, 76
Intel MPI, 77
Intel Xeon, 78
RAPL, 79
Interpreting, 80
Introduction, 81
IPMI, 82

Known issues
Performance Reports, 83
Compiler inlining functions, 84
Incorrect MPI time, 85
Insufficient samples, 86
MPI wrapper libraries, 87
No thread activity while blocking on an MPI call, 88
Not correctly identifying vectorized instructions, 89
OpenBLAS application, 90
Reporting time spent in a function definition, 91
Tail Call, 92
Thread support limitations, 93
Compiler, 94
General, 95
No shared home directory, 96
Problems starting multi-process programs, 97
Starting a program, 98
Starting scalar programs, 99

Licensing
Architecture licensing, 100
License files, 101
Supercomputing and other floating licenses, 102
Using multiple architecture licenses, 103
Workstation and evaluation licenses, 104
Linking, 105
Dynamic
On Cray X-Series using modules environment, 106
Static, 107
On Cray X-Series using modules environment, 108
Log file, 109

map-link modules, 110
Installation
Cray X-Series, 111
Memory breakdown, 112
Mean process memory usage, 113
Peak node memory usage, 114
Peak process memory usage, 115
Metrics
Accelerator breakdown, 116
Computation, 117, 118
Compute, 119
CPU breakdown, 120
CPU metrics breakdown, 121
Cycles per instruction, 122
Effective process collective rate, 123
Effective process point-to-point rate, 124
Effective process read rate, 125
Effective process write rate, 126
Energy
Accelerator, 127
CPU, 128
Mean node power, 129
Peak node power, 130
System, 131
Energy breakdown, 132
Global memory accesses, 133
GPU Utilization, 134
I/O breakdown, 135
Input/Output, 136
L2 cache misses, 137
Lustre metrics, 138
Mean GPU memory usage, 139
Mean process memory usage, 140
Memory accesses, 141
Memory breakdown, 142
Mispredicted branch instructions, 143
MPI, 144
MPI breakdown, 145
OpenMP breakdown, 146
OpenMP code, 147
Peak GPU memory usage, 148
Peak node memory usage, 149
Peak process memory usage, 150
Physical core utilization, 151, 152
Scalar numeric ops, 153
Single core code, 154
Stalled cycles, 155
Synchronization, 156, 157
System load, 158, 159
Threads breakdown, 160
Time in collective calls, 161
Time in point-to-point calls, 162
Time in reads, 163
Time in writes, 164
Vector numeric ops, 165
Waiting for accelerators, 166
MPI
Troubleshooting, 167
MPI breakdown, 168
Effective process collective rate, 169
Effective process point-to-point rate, 170
Time in collective calls, 171
Time in point-to-point calls, 172
MPI wrapper libraries, 173
MPICH 2, 174
MPICH 3, 175

NVIDIA CUDA, 176

Obtaining support, 177
Online resources, 178
Open MPI, 179
OpenMP breakdown, 180
Computation, 181
Physical core utilization, 182
Synchronization, 183
System load, 184
Output locations, 185

Performance reports
Energy breakdown
Accelerator, 186
Threads breakdown
Synchronization, 187
Platform MPI, 188
Portland Group Compiler, 189
Profiling
Preparing a program, 190

Report summary, 191
Compute, 192
Input/Output, 193
MPI, 194
Requirements
Energy metrics, 195
Running, 196

SGI, 197
SLURM, 198
Static linking, 199
On Cray X-Series, 200
Supported Platforms, 201

Textual performance reports, 202
Thread support limitations, 203
Threads breakdown, 204
Computation, 205
Physical core utilization, 206
System load, 207

Unified Parallel C, 208, 209
UPC
Berkeley, 210
GNU, 211

Worked examples, 212
Code characterization and run size comparison, 213
Deeper CPU metric analysis, 214
I/O performance bottlenecks, 215

Was this page helpful? Yes No