You copied the Doc URL to your clipboard.

Arm Cortex-A55 Core Technical Reference Manual : Error reporting

Error reporting

Detected errors are reported in the Error Record Primary Syndrome Register, ERXSTATUS/ERXSTATUS_EL1, and the Error Record Miscellaneous Register, ERXMISC0/ERXMISC0_EL1.

This includes errors that are successfully corrected, and errors that cannot be corrected. If multiple errors occur on the same clock cycle, then only one error is reported but the OF (overflow) bit is set.

There are two error records provided, which can be selected with the ERRSELR/ERRSELR_EL1 register. Record 0 is private to the core, and is updated on any error in the core RAMs including L1 caches, TLB, and L2 cache. Record 1 records any error in the L3 and snoop filter RAMs and is shared between all cores in the cluster.

If enabled in the ERXCTLR/ERXCTLR_EL1 register, all errors that are detected cause a fault handling interrupt. The fault handling interrupt is generated on the nFAULTIRQ[0] pin for L3 and snoop filter errors, or on the nFAULTIRQ[n+1] pin for core n L1 and L2 errors.

Errors that cannot be corrected, and therefore might result in data corruption, also cause an abort or an interrupt signal to be asserted, alerting software to the error. The software can either attempt to recover or can restart the system. Some errors are deferred by poisoning the data. This does not cause an abort at the time of the error, but only when the error is consumed.

  • Uncorrectable errors in the L1, L2, or L3 data RAMs when read by an instruction fetch, a load instruction or a TLB pagewalk, might result in a precise data abort or prefetch abort.
  • Uncorrectable errors in the L1, L2, or L3 data RAMs when the line is being evicted from a cache causes the data to be poisoned. This might be because of a natural eviction, a linefill from a higher level of cache, a cache maintenance operation, or a snoop. If the poisoned line is evicted from the cluster for any reason, and the interconnect does not support data poisoning, then the nERRIRQ[0] pin is asserted, if enabled.
  • Uncorrectable errors in the L1 tag or dirty RAMs, or in the L2 tag RAMs, causes the nERRIRQ[n+1] pin to be asserted for core n, if enabled.
  • Uncorrectable errors in the L3 tag RAMs or SCU snoop filter RAMs causes the nERRIRQ[0] pin to be asserted, if enabled.

Note

  • When nERRIRQ is asserted it remains asserted until the error is cleared by a write of 0 to the UE bit in the ERXSTATUS/ERXSTATUS_EL1 register.
  • Arm® recommends that the nERRIRQ pin is connected to the interrupt controller so that an interrupt or system error is generated when the pin is asserted.

The fault and error interrupt pins are cleared by writing to the ERXSTATUS/ERXSTATUS_EL1 registers.

When a snoop hits on a line with an uncorrectable data error, the data is returned if required by the snoop, but the snoop response indicates that the data is poisoned. If a snoop hits on a tag that has an uncorrectable error, then it is treated as a snoop miss, because the error means that it is unknown if the cache line is valid or not.

The following accesses update the Error Record Primary Syndrome Register:

  • ECC error detected in any of the RAM protected by ECC.
  • Poisoned data received from the DSU when the CPU does not support ECC protection.
  • Dirty data received from the DSU and the data is flagged with a data error.

Note

It is possible for an error to be counted more than once. For example, multiple accesses can read the location with the error before the line is evicted.
Observations and constraints

The following observations should be made about ERRXSTATUS and ERRXMISCn registers:

  • If two or more memory errors occur in the same cycle, only one error is reported and the other error count is incremented. If more than two errors occur on the same cycle then the additional errors will not be counted.
  • If two or more first memory error events from different RAMs occur in the same cycle, one of the errors is selected arbitrarily.
  • If a new error arrives while the ERRXSTATUS.V bit is set, the way, index, and level information is not updated, but the other error field or the repeat error field is updated.
  • If two or more memory errors from different RAMs that do not match the level, way and index information in this register when the ERRXSTATUS.V bit is set, occur in the same cycle, the Other error count field is only incremented once.
  • This register is not reset on a warm reset.