What is a transcription error?
A transcription error is a type of data entry error commonly made by human operators or optical character recognition (OCR) programs. Human transcription errors are usually the result of typographical mistakes caused by striking the wrong key on a keyboard or by striking two or more wrong keys because of finger-keyboard misalignment. Electronic or non-human transcription errors generally occur because a program attempts to scan matter that it is unfamiliar with or it cannot read.
Transcription errors occur when data (words, letters, numbers, special characters) are incorrectly entered into an information system. The system is often a computer text file or some kind of electronic records system. These errors are usually accidental and can happen when a transcriber (human or machine) records source information incorrectly or enters the information incorrectly into the electronic system.
Transcription errors have been the bane of authors and editors for decades. Other users, such as medical and legal offices, also commonly experience such errors. This is because they transcribe large quantities of hand-written notes, audio tapes and other types of unstructured text documents into electronic formats, and errors occur during the transcription process. This may occur whether the transcriber is a human or a machine.
Here are some examples of transcription errors:
- ZIP code: 54829 (wrong) instead of 54729 (correct)
- Name: Stamley (wrong) instead of Stanley (correct)
- Date: Jun 42, 2003 (wrong) instead of Jun 24, 2003 (correct)
Human transcription errors vs. machine transcription errors
As more printed matter is transcribed into digital format and with the increasing workload on transcribers (both human and electronic), this problem is likely to get worse before it gets better.
In most transcription projects there are one of two reasons why transcription errors occur. One is simple human carelessness or lack of attention to detail. Human misunderstandings can also result in errors. A common cause of misunderstandings is accent differences; another is the speaker not speaking or enunciating clearly. Other common human causes include the following:
- Transcribers not looking at the computer screen when typing.
- Transcribers cannot accurately read (or hear) the source material.
- Transcribers are unfamiliar with the transcription equipment or the source material (or its subject matter).
- The source material has too much jargon (technical terms) or uses too many long, confusing sentences.
- Transcribers misplace their fingers on the keyboard.
The use of OCR software can also lead to transcription errors. This is because the software cannot comprehend language or understand context. Instead it will match the received input with information in its database. If a match is not found, it will incorrectly interpret the new input, resulting in a transcription error. Such errors are common when software tries to transcribe the letters and words in a scanned image of a document to convert the document into a digital form. The software may be unable to perform accurate transcriptions, resulting in transcription errors if the following occurs:
- The source document contains illegible handwriting or blotches.
- The source document is wrinkled.
- There's dirt on the scanner.
- The lighting is poor.
Detecting and measuring transcription errors
Transcription errors can be measured with the word error rate (WER). WER refers to the number of errors in a piece of text divided by the total number of words.
WER = number of errors ÷ number of words
The WER can be calculated by adding all the insertions, deletions and substitutions occurring in a piece of text (which contains a sequence of recognized words). The number is then divided by the total number of words in the text to derive the WER percentage.
WER = ((insertions + deletions + substitutions) ÷ number of words) × 100%
The following applies to this formula:
- Substitution = a letter in a word getting replaced to create a new word. Example: chamcoal (incorrect) instead of charcoal (correct).
- Deletion = a letter in a word getting removed to create a new word. Example: mose (incorrect) instead of mouse (correct).
- Insertion = a letter in a word getting added or a new word getting added. Example: we've um got a new uh uh car (incorrect) instead of we've got a new car.
Suppose an original audio file (to be transcribed) contains 85 words. The transcription included 17 substitutions, insertions and deletions.
WER = 17 ÷ 85 = 0.2 × 100% = 20%
In many situations, an acceptable WER is set for data entry workers. This number can vary depending on the transcription use case. The WER is always low in critical use cases. For example, in the medical field, a small medical transcription error can be detrimental, so the WER is always set at a low threshold.
Detecting and reducing transcription errors
Some transcription errors can be detected using spell-checking programs. However, many transcription errors, particularly those involving numeric data, are difficult or impossible to detect. That said, it is possible to reduce the possibility of transcription errors with double data entry of the same source material. This refers to multiple people transcribing the same material and then comparing the transcriptions to confirm accuracy. However, this method increases transcription effort, time and costs because it requires more human resources.
Another way to detect and reduce errors is to use automated quality control software that checks sentence syntax and context to find incorrect letters or words. Software with automatic transcription capabilities or powered by artificial intelligence, machine learning or APIs can generate more accurate transcriptions. In general a strong quality control process can reduce transcription errors. Training transcribers to properly read or hear source material and follow transcription best practices can also reduce errors.
Transcription errors vs. transposition errors
Transcription errors are not the same as transposition errors, although both are common error types that occur during data entry and transcription. A transcription error occurs when the incorrect values or letters are input are by a human or computer program. In contrast a transposition error occurs when certain characters or letters are interchanged (transposed).
These are examples of transposition errors:
- ZIP code: 57429 (wrong) instead of 54729 (correct)
- Name: Stnaley (wrong) instead of Stanley (correct)
- Date: Jnu 23, 2003 (wrong) instead of Jun 23, 2003 (correct)
Transposition errors are almost always human in origin, whereas transcription errors can be caused by humans and machines (e.g., OCR software).
See how enterprise analytics benefits of natural language processing.