The noisy channel model is a framework used in natural language processing (NLP) to identify the correct word in situations where it is unclear. The framework helps detect intended words for spell checkers, virtual assistants, translation programs, question answering systems and speech to text software.
Noise, in this context, is anything that obscures signals and data. The noisy channel model is so-named because the original signal – the intended word -- is obscured in transmission when disruptions or errors create noise in the channel. In this case, noise could consist of a misspelling in written language, for example, or ambient sounds, mispronunciation or slurred speech in spoken language.
Here’s a basic example of how the noisy channel model might work with a spell check program:
When a word is not found in the spell check dictionary, it is identified as a misspelling and candidate words are suggested based on their probability of being the intended word, usually as a result of how close those words are to the misspelling. As a rule, the most likely candidates will involve a single change, and there are four different types of single-change errors: deletion, insertion, substitution and reversal. If the misspelled word is acress, for example, acres could have been intended but an extra s added; actress could have been intended but the t missed; across could have been intended but e typed instead of o; caress could have been intended but the first two letters typed in the wrong sequence. Thus, if the user types acress, the spell check program might suggest those four words.
Due to the vagaries of human speech and the potential for actual noise, speech recognition software has additional challenges to text-based systems. However, the basic framework is similar.