You copied the Doc URL to your clipboard.

29 Exporting profiler data in JSON format

MAP provides an option to export the profiler data in machine readable JSON format.

To export as JSON, first you need to open a .map file in MAP. Then the profile data can be exported by clicking File and selecting the Export Profile Data as JSON option.

For a command line option, see 28 Running MAP from the command line.

29.1 JSON format

The JSON document contains a single JSON object containing two object members, info containing general information about the profiled program, and samples with the sampled information. An example of profile data exported to a JSON file is given in Section 29.4 .

  • info (Object): If some information is not available, the value is null instead.
    • command_line (String): Command line call used to run the profiled application (for example aprun -N 24 -n 256 -d 1 ./my_exe).
    • machine (String): Hostname of the node on which the executable was launched.
    • notes (String): A short description of the run or other notes on configuration and compilation settings. This is specified by setting the environment variable ALLINEA_NOTES berfore running MAP.
    • number_of_nodes (Number): Number of nodes run on.
    • number_of_processes (Number): Number of processes run on.
    • runtime (Number): Runtime in milliseconds.
    • start_time (String): Date and time of run in ISO 8601 format.
    • create_version (String): Version of MAP used to create the map file.
    • metrics (Object): Attributes about the overall run, reported once per process, each represented by an object with max, min, mean, var and sums fields, or null, when the metric is not available. The sums series contains the sum of the metric across all processes / nodes for each sample. In many cases the values over all nodes will be the same, that is the max, min and mean values are the same, with variance zero. For example, in homogeneous systems num_cores_per_node is the same over all nodes.
      • wchar_total (Object): The number of bytes written in total by I/O operation system calls (see wchar in the Linux Programmer's Manual page 'proc': man 5 proc).
      • rchar_total (Object): The number of bytes read in total by I/O operation system calls (see rchar in the Linux Programmer's Manual page 'proc': man 5 proc).
      • num_cores_per_node (Object): Number of cores available per node.
      • memory_per_node (Object): RAM installed per node.
      • nvidia_gpus_count (Object): Number of GPUs per node.
      • nvidia_total_memory (Object): GPU frame buffer size per node.
      • num_omp_threads_per_process (Object): Number of OpenMP worker threads used per process.
  • samples (Object)
    • count (Number): Number of samples recorded.
    • window_start_offset (Array of Numbers): Offset of the beginning of each sampling window, starting from zero. The actual sample might have been taken anywhere in between this offset and the start of the next window, that is the window offsets wi and wi+1 define a semi-open set (wi,wi+1] in which the sample was taken.
    • activity (Object): Contains information about the proportion of different types of activity performed during execution, according to different view modes. The types of view modes possibly shown are OpenMP, PThreads and Main Thread, described in Section 26. Only available view modes are exported, for example, a program without OpenMP sections will not have an OpenMP activity entry.

      Note

      The sum of the proportions in an activity might not add up to 1, this can happen when there are fewer threads running than MAP has expected. Occasionally the sum of the proportions shown for a sample in PThreads or OpenMP threads mode might exceed 1. When this happens, the profiled application uses more cores than MAP assumes the maximum number of cores per process can be. This can be due to middleware services launching helper threads which, unexpectedly to MAP, contribute to the activity of the profiled program. In this case, the proportions for that sample should not be compared with the rest of proportions for that activity in the sample set.

    • metrics (Object): Contains an object for each metric that was recorded. These objects contain four lists each, with the minimum, maximum, average and variance of that metric in each sample. The format of a metrics entry is given in Section 29.3 . All metrics recorded in a run are present in the JSON, including custom metrics. The names and descriptions of all core MAP metrics are given in Section 29.3 . It is assumed that a user including a custom metrics library is aware of what the custom metric is reporting. See the Arm Metric Plugin Interface documentation.

29.2 Activities

Each exported object in an activity is presented as a list of fractional percentages (0.0 - 1.0) of sample time recorded for a particular activity during each sample window. Therefore, there are as many entries in these list as there are samples.

29.2.1 Description of categories

The following is the list of all of the categories. Only available categories are exported, see sections 29.2.2 and 29.2.3 .

  • normal_compute: Proportion of time spent on the CPU which is not categorized as any of the following activities. The computation can be, for example, floating point scalar (vector) addition, multiplication or division.
  • point_to_point_mpi: Proportion of time spent in point-to-point MPI calls on the main thread and not inside an OpenMP region.
  • collective_mpi: Proportion of time spent in collective MPI calls on the main thread and not inside an OpenMP region.
  • point_to_point_mpi_openmp: Proportion of time spent in point-to-point MPI calls made from any thread within an OpenMP region.
  • collective_mpi_openmp: Proportion of time spent in collective MPI calls made from any thread within an OpenMP region.
  • point_to_point_mpi_non_main_thread: Proportion of time spent in point-to-point MPI calls on a pthread, but not on the main thread nor within an OpenMP region.
  • collective_mpi_non_main_thread: Proportion of time spent in collective MPI calls on a pthread, but not on the main thread nor within an OpenMP region.
  • openmp: Proportion of time spent in an OpenMP region, that is compiler-inserted calls used to implement the contents of a OpenMP loop.
  • accelerator: Proportion of time spent in calls to accelerators, that is, blocking calls waiting for a CUDA kernel to return.
  • pthreads: Proportion of compute time on a non-main (worker) pthread.
  • openmp_overhead_in_region: Proportion of time spent setting up OpenMP structures, waiting for threads to finish and so on.
  • openmp_overhead_no_region: Proportion of time spent in calls to the OpenMP runtime from an OpenMP region.
  • synchronisation: Proportion of time spent in thread synchronization calls, such as pthread_mutex_lock.
  • io_reads: Proportion of time spent in I/O read operations, such as 'read'.
  • io_writes: Proportion of time spent in I/O write operations. Also includes file open and close time as these are typically only significant when writing.
  • io_reads_openmp: Proportion of time spent in I/O read operations from within an OpenMP region.
  • io_writes_openmp: Proportion of time spent in I/O write operations from within an OpenMP region.
  • mpi_worker: Proportion of time spent in the MPI implementation on a worker thread.
  • mpi_monitor: Proportion of time spent in the MPI monitor thread.
  • openmp_monitor: Proportion of time spent in the OpenMP monitor thread.
  • sleep: Proportion of time spent in sleeping threads and processes.

29.2.2 Categories available in main_thread activity

  • normal_compute
  • point_to_point_mpi
  • collective_mpi
  • point_to_point_mpi_openmp
  • collective_mpi_openmp
  • openmp
  • accelerator
  • openmp_overhead_in_region
  • openmp_overhead_no_region
  • synchronisation
  • io_reads
  • io_writes
  • io_reads_openmp
  • io_writes_openmp
  • sleep

29.2.3 Categories available in openmp and pthreads activities

  • normal_compute
  • point_to_point_mpi
  • collective_mpi
  • point_to_point_mpi_openmp
  • collective_mpi_openmp
  • point_to_point_mpi_non_main_thread
  • collective_mpi_non_main_thread
  • openmp
  • accelerator
  • pthreads
  • openmp_overhead_in_region
  • openmp_overhead_no_region
  • synchronisation
  • io_reads
  • io_writes
  • io_reads_openmp
  • io_writes_openmp
  • mpi_worker
  • mpi_monitor
  • openmp_monitor
  • sleep

29.3 Metrics

The following list contains the core metrics reported by MAP.

Only available metrics are exported to JSON. For example, if there is no Lustre filesystem then the Lustre metrics will not be included. If any custom metrics are loaded, they will be included in the JSON, but are not documented here.

For more information on the metrics see 24 Metrics View.

29.4 Example JSON output

In this section an example is given of the format of the JSON that is generated from a MAP file. This illustrates the description that has been given in the previous sections. This is not a full file, but should be used as an indication of how the information looks after export.


{ 
"info" : {
"command_line" : "mpirun -np 4 ./exec",
"machine" : "hal9000",
"number_of_nodes" : 30,
"number_of_processes" : 240,
"runtime" : 8300,
"start_time" : "2016-05-13T11:36:31",
"create_version" : "6.0.4"
"metrics": {
wchar_total: {max: 384605588, min: 132, mean: 24075798, var: 546823},
rchar_total: {max: 6123987, min: 63, mean: 9873, var: 19287},
num_cores_per_node: {max: 4, min: 4, mean: 4, var: 0},
memory_per_node: {max: 4096, min: 4096, mean: 4096, var: 0},
nvidia_gpus_count: {max: 0, min: 0, mean: 0, var: 0},
nvidia_total_memory: {max: 0, min: 0, mean: 0, var: 0},
num_omp_threads_per_process: {max: 6, min: 6, mean: 6, var: 0},
}
},
"samples" : {
"count" : 4,
"window_start_offsets" : [ 0, 0.2, 0.4, 0.6 ],
"activity" : {
"main_thread" : {
"normal_compute" : [ 0.762, 0.996, 1, 0.971 ],
"io_reads" : [ 0.00416, 0.00416, 0, 0.00416 ],
"io_writes" : [ 0.233, 0, 0, 0 ],
"openmp" : [ 0, 0, 0, 0.01667 ],
"openmp_overhead_in_region" : [ 0, 0, 0, 0.1 ],
"openmp_overhead_no_region" : [ 0, 0, 0, 0.00417 ],
"sleep" : [ 0, 0, 0, 0 ]
},
"openmp" : {
"normal_compute" : [ 0.762, 0.996, 1, 0.971 ],
"io_reads" : [ 0.00416, 0.00416, 0, 0.00416 ],
"io_writes" : [ 0.233, 0, 0, 0 ],
"openmp" : [ 0, 0, 0, 0.01319 ],
"openmp_overhead_in_region" : [ 0, 0, 0, 0 ],
"openmp_overhead_no_region" : [ 0, 0, 0, 0 ],
"sleep" : [ 0, 0, 0, 0 ]
},
"pthreads" : {
"io_reads" : [ 0.00069, 0.00069, 0, 0.00069 ],
"io_writes" : [ 0.0389, 0, 0, 0 ],
"normal_compute" : [ 0.1270, 0.1659, 0.1666, 0.1652 ],
"openmp" : [ 0, 0, 0, 0.01319 ],
"openmp_overhead_in_region" : [ 0, 0, 0, 0.02153 ],
"openmp_overhead_no_region" : [ 0, 0, 0, 0.00069 ],
"sleep" : [ 0, 0, 0, 0 ]
}
},
"metrics" : {
"wchar_total" : {
"mins" : [ 3957, 3957, 3958, 4959 ],
"maxs" : [ 4504, 4959, 5788, 10059 ],
"means" : [ 3965.375, 4112.112, 4579.149, 6503.496 ],
"vars" : [ 2159.809, 49522.783, 169602.769, 2314522.699 ],
"sums" : [ 15860, 16448, 18316, 26012 ]
},
"bytes_read" : {
"mins" : [ 0, 0, 0, 0 ],
"maxs" : [ 34647.255020415301, 0, 0, 0 ],
"means" : [ 645.12988722358205, 0, 0, 0 ],
"vars" : [ 9014087.0327749606, 0, 0, 0 ],
"sums" : [ 2580, 0, 0, 0]
},
"bytes_written" : {
"mins" : [ 0, 0, 0, 0 ],
"maxs" : [ 123, 0, 0, 0 ],
"means" : [ 32, 0, 0, 0 ],
"vars" : [ 12, 0, 0, 0 ],
"sums" : [ 128, 0, 0, 0]
}
}
} }
Was this page helpful? Yes No