ECC (either "error correction [or correcting] code" or "error checking and correcting") allows data that is being read or transmitted to be checked for errors and, when necessary, corrected on the fly. It differs from parity-checking in that errors are not only detected but also corrected. ECC is increasingly being designed into data storage and transmission hardware as data rates (and therefore error rates) increase.
Here's how it works for data storage:
- When a unit of data (or "word") is stored in RAM or peripheral storage, a code that describes the bit sequence in the word is calculated and stored along with the unit of data. For each 64-bit word, an extra 7 bits are needed to store this code.
- When the unit of data is requested for reading, a code for the stored and about-to-be-read word is again calculated using the original algorithm. The newly generated code is compared with the code generated when the word was stored.
- If the codes match, the data is free of errors and is sent.
- If the codes don't match, the missing or erroneous bits are determined through the code comparison and the bit or bits are supplied or corrected.
- No attempt is made to correct the data that is still in storage. Eventually, it will be overlaid by new data and, assuming the errors were transient, the incorrect bits will "go away."
- Any error that recurs at the same place in storage after the system has been turned off and on again indicate a permanent hardware error and a message is sent to a log or to a system administrator indicating the location with the recurrent errors.
At the 64-bit word level, parity-checking and ECC require the same number of extra bits. In general, ECC increases the reliability of any computing or telecommunications system (or part of a system) without adding much cost. Reed-Solomon codes are commonly implemented; they're able to detect and restore "erased" bits as well as incorrect bits.