voice recognition (speaker recognition)

Voice or speaker recognition is the ability of a machine or program to receive and interpret dictation or to understand and carry out spoken commands.

Voice or speaker recognition is the ability of a machine or program to receive and interpret dictation or to understand and carry out spoken commands. Voice recognition has gained prominence and use with the rise of AI and intelligent assistants, such as Amazon's Alexa, Apple's Siri and Microsoft's Cortana.

Voice recognition systems enable consumers to interact with technology simply by speaking to it, enabling hands-free requests, reminders and other simple tasks.

How voice recognition works

Voice recognition software on computers requires that analog audio be converted into digital signals, known as analog-to-digital conversion. For a computer to decipher a signal, it must have a digital database, or vocabulary, of words or syllables, as well as a speedy means for comparing this data to signals. The speech patterns are stored on the hard drive and loaded into memory when the program is run. A comparator checks these stored patterns against the output of the A/D converter -- an action called pattern recognition.

In practice, the size of a voice recognition program's effective vocabulary is directly related to the random access memory capacity of the computer in which it is installed. A voice recognition program runs many times faster if the entire vocabulary can be loaded into RAM, as compared with searching the hard drive for some of the matches. Processing speed is critical, as well, because it affects how fast the computer can search the RAM for matches.

Some of the most popular voice recognition systems function as virtual assistants to answer questions about weather or perform simple tasks, such as adding items to an online shopping cart.

While voice recognition technology originated on PCs, it has gained acceptance in both business and consumer spaces on mobile devices and in home assistant products. The popularity of smartphones opened up the opportunity to add voice recognition technology into consumer pockets, while home devices, like Google Home and Amazon Echo, brought voice recognition technology into living rooms and kitchens. Voice recognition, combined with the growing stable of internet of things sensors, has added a technological layer to many consumer products that previously lacked any smart capabilities.

As uses for voice recognition technology grow and more users interact with it, the companies implementing voice recognition software will have more data and information to feed into the neural networks that power voice recognition systems, thus improving the capabilities and accuracy of the voice recognition products.

Voice recognition uses

The uses for voice recognition have grown quickly as AI, machine learning and consumer acceptance have matured. In-home digital assistants from Google to Amazon to Apple have all implemented voice recognition software to interact with users. The way consumers use voice recognition technology varies depending on the product, but it can include transcribing voice to text, setting up reminders, searching the internet, and responding to simple questions and requests, such as playing music or sharing weather or traffic information.

The government is also looking for ways to use voice recognition technology for security purposes. The National Security Agency has used voice recognition systems dating back to 2004.

Voice recognition advantages and disadvantages

Voice recognition enables consumers to multitask by speaking directly to their Google Home, Amazon Alexa or other voice recognition technology. By using machine learning and sophisticated algorithms, voice recognition technology can quickly turn your spoken work into written text.

While accuracy rates are improving, all voice recognition systems and programs make errors. Background noise can produce false input, which can be avoided by using the system in a quiet room. There is also a problem with words that sound alike, but that are spelled differently and have different meanings -- for example, hear and here. This problem might someday be largely overcome using stored contextual information. However, this will require more RAM and faster processors than are currently available in personal computers.

History of voice recognition

There has been an exponential growth in voice recognition technology over the past five decades. Dating back to 1976, computers could only understand slightly more than 1,000 words. That total jumped to roughly 20,000 in the 1980s as IBM continued to develop voice recognition technology.

The first speaker recognition product for consumers was launched in 1990 by Dragon, called DragonDictate. In 1996, IBM introduced the first voice recognition product that could recognize continuous speech.

After the launch of smartphones in the second half of the 2000s, Google launched its Voice Search app for the iPhone. Three years later, Apple introduced Siri, which is now a prominent voice recognition assistant.

During this past decade, several other technology leaders have also developed more sophisticated voice recognition software, with Amazon's Echo featuring Alexa and Microsoft's Cortana -- both of which act as personal assistants that respond to voice commands.