Disambiguation (also called word sense disambiguation or text disambiguation) is the act of interpreting an author's intended use of a word that has multiple meanings or spellings.
Since disambiguation can even be a difficult task for humans, it is understandable that computers also have a bit of trouble. For programs such as medical transcription applications, which transcribe spoken language to written language, or assistive technologies that translate typed text into artificial speech, words that have different meanings and spellings can be a challenge. There are two popular methods to address disambiguation: the shallow method and the deep method.
The shallow method, which uses nearby words to determine what the intended meaning, is the more commonly used method. Although it is fairly accurate, this method cannot always be relied on, especially if there are multiple words in the same document that have different meanings. Still, this method is the easiest to implement.
The deep method goes further into the meanings of the words, pulling from lexicons of dictionaries and thesauruses to determine all the possibilities for a word's meaning. Although this is a more precise method to eliminate disambiguation, it is very difficult, primarily because a database comprehensive enough to perform the task with a high degree of accuracy is difficult to create. When a smaller, less comprehensive database is used, the results are likely to be less accurate.
Algorithms are also useful for text disambiguation. Sophisticated artificial intelligence algorithms can be designed to search surrounding sentences, or even entire documents, to find words that can indicate what the intended meaning of a particular word is likely to be. Since most words tend to be given one meaning in a given document, this is usually a reasonably accurate method.
While it is not an easy task, disambiguation is essential for all language processing. Any software which utilizes speech recognition or text-to-speech, for example, must employ some type of disambiguation strategy to have accurate results. Disambiguation is also crucial in the analysis of unstructured data, such as that generated in emails, documents, instant messages and Twitter clients.
See also: text mining