You copied the Doc URL to your clipboard.

Statistical Profiling Extension

The Statistical Profiling Extension (SPE) is an optional extension to the Arm®v8 architecture for profiling software and hardware using randomized sampling.

SPE introduces a set of registers that are specific to the SPE architecture and adds some fields to some Armv8‑A System registers in AArch64 state.

Note

SPE is disabled in AArch32 state.

Statistical profiling is a non-invasive debug operation that works as follows:

  1. An operation is chosen from a sample population at a programmable interval. Operations are architecture instructions or microarchitectural operations. They appear in the sample population the number of times that they are executed.
  2. A trace of the sampled operation is taken.
  3. You can filter out potential sample records based on the type of operation, event, or latency, see SPE counter configuration.
  4. A sample record is created that contains the traced information. It is written to and stored in a memory buffer. When the memory buffer is full, software can process the sample records.

All sample records contain:

  • A timestamp.
  • The context.
  • Whether the sampled operation generated an exception.
  • Whether the sampled operation completed execution.

If the sampled operation completes execution and does not generate an exception, the sample record also contains:

  • The PC virtual address for the sampled operation.
  • Whether the sampled operation is a branch, a load, a store, or other.
  • Whether the sampled operation is conditional, conditional select, or not.
  • The total latency.
  • The issue latency.

The architecture defines an extra set of data that is collected in the sample record for each sampled operation:

  • Events
  • Cycle counters
  • Addresses

Further information is recorded for specific types of operations.

You can disable profiling at individual Exception levels, and it is always disabled at EL3.

In a multithreaded implementation, Statistical Profiling is implemented per-thread. The sample interval counter counts only operations for the thread that is being profiled. Latency and other cycle counters count each cycle for the processing element for which the thread was active and could issue an operation.

Was this page helpful? Yes No