Voice user interface (VUI) is speech recognition technology that allows people to interact with a computer, smartphone or other device through voice commands. Apple's Siri, Amazon's Alexa, Google's Assistant and Microsoft's Cortana are prime examples of VUIs. What makes a VUI unique is that it uses voice as the primary mode of interaction, in contrast with the traditional keyboard-mouse-monitor combination or touch screen. This voice-first approach can allow users to initiate automated services and execute their day-to-day tasks in a faster, more intuitive manner.
A VUI helps users perform tasks such as:
- Performing a web search
- Playing music
- Setting alarms, timers and reminders
- Getting real-time weather and traffic updates
Designing VUI interfaces
The first era of VUIs, according to the book Designing Voice User Interfaces, was dominated by the interactive voice response (IVR) systems developed in the 1980s. These systems were capable of understanding voice inputs over the telephone and executing a given task. By early the 2000s, IVRs started to become commonplace in service industries, such as insurance, banking, aviation, freight and transportation. IVRs would essentially process inbound calls, field customer questions through recorded messages after extracting information from databases, and direct calls to in-house agents. IVRs were initially developed to initiate automated tasks without customers needing to speak to a live person, but today they typically provide the first response before the customer gets to speak to a live agent.
The current state of the art is referred to as the second era of VUIs, wherein they make human-computer interaction possible through natural language processing , automatic speech recognition (ASR) and artificial intelligence (AI) technology. Mobile applications like Siri, Google Now, and Cortana, that combine both visual and voice information in what's known as a multimodal interface, and devices like Amazon Echo and Google Home that are voice-only (sometimes called an auditory interface) are all examples of this second era of VUI.
What makes VUI design complex compared to that of a graphic user interface (GUI) is that it does not have a screen to display information, options or commands, and users have no leeway to access information over time. The transient nature of auditory interfaces therefore necessitates that the VUI clearly state possible interaction options and give just the necessary information without overloading users. It also requires coaching users about the voice commands they can use and the type of interactions they can perform.
What makes the second era of VUIs important is that the technology attempts to go beyond the typical one-turn conversation often associated with IVRs (a turn being one interaction between user and the system) and "learn" from users' inputs and predict their future needs. VUI designers have yet to design a system that simulates a human conversation entirely, but rapid development in AI and machine learning is helping maximize user experience and making VUIs "smarter."
Given the complexity, designing an effective VUI requires knowledge of a combination of fields, including computer science, human psychology, and linguistics, as well as careful study of human cognitive abilities, conversational language and speech technology.
VUIs for business
It is important to note that the benefits of VUI are not limited to simplifying users' home life and that it holds promise for business use. For example: Amazon's Alexa for Business can assist workers in performing tasks such as joining conference calls, booking conference rooms, finding important information and training employees. When powered by IoT and cloud technology, VUIs can be effectively integrated with third-party systems in smart homes and offices and serve a number of industries, from health care and manufacturing to retail.