You copied the Doc URL to your clipboard.

11.4.1. Cortex-A9 specific events

Table 11.6 shows the Cortex-A9 specific events. In the value column of Table 11.6 Precise means the event is counted precisely. Events related to stalls and speculative instructions appear as Approximate entries in this column.

Table 11.6. Cortex-A9 specific events
Event DescriptionValue
0x40

Java bytecode execute.[a]

Counts the number of Java bytecodes being decoded, including speculative ones.

Approximate
0x41

Software Java bytecode executed.[a]

Counts the number of software Java bytecodes being decoded, including speculative ones.

Approximate
0x42

Jazelle backward branches executed.[a]

Counts the number of Jazelle taken branches being executed. This includes the branches that are flushed because of a previous load/store that aborts late.

Approximate
0x50

Coherent linefill miss.[b]

Counts the number of coherent linefill requests performed by the Cortex-A9 processor that also miss in all the other Cortex-A9 processors. This means that the request is sent to the external memory.

Precise
0x51

Coherent linefill hit.[b]

Counts the number of coherent linefill requests performed by the Cortex-A9 processor that hit in another Cortex-A9 processor. This means that the linefill data is fetched directly from the relevant Cortex-A9 cache.

Precise
0x60

Instruction cache dependent stall cycles.

Counts the number of cycles where the processor:

  • is ready to accept new instructions,

  • does not receive a new instruction, because:

    • the instruction side is unable to provide one

    • the instruction cache is performing at least one linefill.

Approximate
0x61

Data cache dependent stall cycles.

Counts the number of cycles where the processor has some instructions that it cannot issue to any pipeline, and the Load Store unit has at least one pending linefill request, and no pending TLB requests.

Approximate
0x62

Main TLB miss stall cycles.

Counts the number of cycles where the processor is stalled waiting for the completion of translation table walks from the main TLB. The processor stalls because the instruction side is not able to provide the instructions, or the data side is not able to provide the necessary data.

Approximate
0x63

STREX passed.

Counts the number of STREX instructions architecturally executed and passed.

Precise
0x64

STREX failed.

Counts the number of STREX instructions architecturally executed and failed.

Precise
0x65

Data eviction.

Counts the number of eviction requests because of a linefill in the data cache.

Precise
0x66

Issue does not dispatch any instruction.

Counts the number of cycles where the issue stage does not dispatch any instruction because it is empty or cannot dispatch any instructions.

Precise
0x67

Issue is empty.

Counts the number of cycles where the issue stage is empty.

Precise
0x68

Instructions coming out of the core renaming stage.

Counts the number of instructions going through the Register Renaming stage. This number is an approximate number of the total number of instructions speculatively executed, and an even more approximate number of the total number of instructions architecturally executed. The approximation depends mainly on the branch misprediction rate.

The renaming stage can handle two instructions in the same cycle so the event is two bits long:

b00

No instructions coming out of the core renaming stage.

b01

One instruction coming out of the core renaming stage.

b10

Two instructions coming out of the core renaming stage.

Approximate
0x69

Number of data linefills.[c]

Counts the number of linefills performed on the external AXI bus. This event counts all data linefill requests, caused by:

  • loads, including speculative ones

  • stores

  • PLD

  • prefetch

  • page table walk.

Precise
0x6A

Number of prefetcher linefills.[c]

Counts the number of data linefills caused by prefetcher requests

Precise
0x6B

Number of hits in prefetched cache lines.[c]

Counts the number of cache hits in a line that belongs to a stream followed by the prefetcher. This includes:

  • lines that have been prefetched by the automatic data prefetcher

  • lines already present in the cache, before the prefetcher action.

Precise
0x6E

Predictable function returns.

Counts the number of procedure returns whose condition codes do not fail, excluding all returns from exception. This count includes procedure returns that are flushed because of a previous load/store that aborts late.

Only the following instructions are reported:

  • BX R14

  • MOV PC LR

  • POP {..,pc}

  • LDR pc,[sp],#offset.

The following instructions are not reported:

  • LDMIA R9!,{..,PC} (ThumbEE state only)

  • LDR PC,[R9],#offset (ThumbEE state only)

  • BX R0 (Rm != R14)

  • MOV PC,R0 (Rm != R14)

  • LDM SP,{...,PC} (writeback not specified)

  • LDR PC,[SP,#offset] (wrong addressing mode).

Approximate
0x70

Main execution unit instructions.

Counts the number of instructions being executed in the main execution pipeline of the processor, the multiply pipeline and arithmetic logic unit pipeline. The counted instructions are still speculative.

Approximate
0x71

Second execution unit instructions.

Counts the number of instructions being executed in the processor second execution pipeline (ALU). The counted instructions are still speculative.

Approximate
0x72

Load/Store Instructions.

Counts the number of instructions being executed in the Load/Store unit. The counted instructions are still speculative.

Approximate
0x73

Floating-point instructions.

Counts the number of floating-point instructions going through the Register Rename stage. Instructions are still speculative in this stage.

Two floating-point instructions can be renamed in the same cycle so the event is two bits long:

b00

No floating-point instruction renamed.

b01

One floating-point instruction renamed.

b10

Two floating-point instructions renamed.

Approximate
0x74

NEON instructions.

Counts the number of NEON instructions going through the Register Rename stage. Instructions are still speculative in this stage.

Two NEON instructions can be renamed in the same cycle so the event is two bits long:

b00

No NEON instruction renamed.

b01

One NEON instruction renamed.

b10

Two NEON instructions renamed.

Approximate
0x80

Processor stalls because of PLDs.

Counts the number of cycles where the processor is stalled because PLD slots are all full.

Approximate
0x81

Processor stalled because of a write to memory.

Counts the number of cycles when the processor is stalled. The data side is stalled also, because it is full and executes writes to the external memory.

Approximate
0x82

Processor stalled because of instruction side main TLB miss.

Counts the number of stall cycles because of main TLB misses on requests issued by the instruction side.

Approximate
0x83

Processor stalled because of data side main TLB miss.

Counts the number of stall cycles because of main TLB misses on requests issued by the data side.

Approximate
0x84

Processor stalled because of instruction micro TLB miss.

Counts the number of stall cycles because of micro TLB misses on the instruction side. This event does not include main TLB miss stall cycles that are already counted in the corresponding main TLB event.

Approximate
0x85

Processor stalled because of data micro TLB miss.

Counts the number of stall cycles because of micro TLB misses on the data side. This event does not include main TLB miss stall cycles that are already counted in the corresponding main TLB event.

Approximate
0x86

Processor stalled because of DMB.

Counts the number of stall cycles because of the execution of a DMB. This includes all DMB instructions being executed, even speculatively.

Approximate
0x8A

Integer clock enabled.

Counts the number of cycles when the integer core clock is enabled.

Approximate
0x8B

Data engine clock enabled.

Counts the number of cycles when the data engine clock is enabled.

Approximate
0x8C

NEON SIMD clock enabled.[c]

Counts the number of cycles when the NEON SIMD clock is enabled.

Approximate
0x8D

Instruction TLB allocation.[c]

Counts the number of TLB allocations because of Instruction requests.

Approximate
0x8E

Data TLB allocation.[c]

Counts the number of TLB allocations because of Data requests.

Approximate
0x90

ISB instructions.

Counts the number of ISB instructions architecturally executed.

Precise
0x91

DSB instructions.

Counts the number of DSB instructions architecturally executed.

Precise
0x92

DMB instructions.

Counts the number of DMB instructions speculatively executed.

Approximate
0x93

External interrupts.

Counts the number of external interrupts executed by the processor.

Approximate
0xA0 PLE cache line request completed.[d]Precise
0xA1PLE cache line request skipped.[d]Precise
0xA2PLE FIFO flush.[d]Precise
0xA3PLE request completed.[d]Precise
0xA4PLE FIFO overflow.[d]Precise
0xA5PLE request programmed.[d]Precise

[a] Only when the design implements the Jazelle Extension. Otherwise reads as 0.

[b] For use with Cortex-A9 multiprocessor variants.

[c] This event has no corresponding mapping on PMUEVENT. It can be counted only in the Cortex-A9 internal PMU event counters.

[d] Active only when the PLE is present. Otherwise reads as 0.


Was this page helpful? Yes No