Cache ECC and parity
The Cortex®-A55 core implements the RAS extension to the Arm architecture which provides mechanisms for standardized reporting of the errors generated by cache protection mechanisms.
When configured with core cache protection, the Cortex-A55 core can detect and correct a 1-bit error in any RAM and detect 2-bit errors in some RAMs.
NoteFor information about SCU-L3 cache protection, see the Arm® DynamIQ™ Shared Unit Technical Reference Manual.
The RAS extension improves the system by reducing unplanned outages:
- Transient errors can be detected and corrected before they cause application or system failure.
- Failing components can be identified and replaced.
- Failure can be predicted ahead of time to allow replacement during planned maintenance.
Errors that are present but not detected are known as latent or undetected errors. A transaction carrying a latent error is corrupted. In a system with no error detection, all errors are latent errors and are silently propagated by components until either:
- They are masked and do not affect the outcome of the system. These are benign or false errors.
- They affect the service interface of the system and cause failure. These are silent data corruptions.
The severity of a failure can range from minor to catastrophic. In many systems, data or service loss is regarded as more of a minor failure than data corruption, as long as backup data is available.
The RAS extension focuses on errors that are produced from hardware faults, which fall into two main categories:
- Transient faults.
- Persistent faults.
The RAS extension describes data corruption faults, which mostly occur in memories and on data links. RAS concepts can also be used for the management of other types of physical faults found in systems, such as lock-step errors, thermal trip, and mechanical failure. The RAS extension provides a common programmers model and mechanisms for fault handling and error recovery.