A look at effective voice user interface design
As smart device technology matures, designers must ensure that voice user interfaces integrate new DSP and speaker technology for a better user experience.
Voice recognition technology has existed since the early 1950s, though many of today's consumers are likely more familiar with Siri or Cortana as their first brush with a voice user interface. Digital assistants and smart speakers, such as the Amazon Echo and Google Nest, have made a considerable impression on the consumer market: Forecasts suggest that there will be more smart speakers than smart homes by 2024. It may seem that platforms from Amazon, Google and Apple have fulfilled all the market's voice technology needs.
In truth, these advances are just the beginning of smart homes' shift to total voice control. As incredible as existing smart speakers' capabilities are, they are just version 1.0 in the great voice control revolution -- a revolution that is propelled by thoughtful design.
Designing to overcome limitations
Consumers are increasingly seeking offerings that make their home space fit overlapping needs more comfortably. The essential deliverables of a smart home include entertainment, security, comfort, energy management and senior safety. These deliverables are all made simpler and more accessible with voice control. Homeowners expect that their smart home devices will consistently perform these tasks without the issues of latency, security or reliability.
As technology advances, engineers design new hardware and software to overcome the limitations of today's smart speakers. If smart speakers are the gateway, the next upgrade can be expected to provide significantly more natural user experiences.
New digital signal processing (DSP), designed specifically for audio edge processing, can be discreetly embedded into products and provide an immediate response to user voice commands without cloud latency, and eliminates the processing constraints frequently faced by original equipment manufacturers (OEMs) that can hinder usability, such as power consumption, memory limits and integration compatibility. These offerings provide new levels of personalization and -- along with machine learning -- edge devices will increase their intelligence and usefulness daily.
Design for far-field audio capture
Future voice user interfaces (VUIs) will also have an "anywhere and everywhere" feel where users can speak without needing a smart speaker in proximity. Lights, thermostats and other appliances can turn the whole home into a listenable zone, that await the wake word or sound.
Far-field vocal capture, the general term for when a speaking voice isn't physically close to the microphone, requires specialized hardware and software, and most importantly, dedicated design consideration. Notably, this includes port orientation, microphone array and beamforming.
Port orientation, the physical opening where audio signals can be accepted without obstruction, is a primary concern. The acoustic port should be far enough from speakers and noise sources, such as motors, to reduce as much extraneous noise at the source as possible. Poor port placement can cause costly changes in printed circuit boards or plastics later in the product design cycle.
Microphone arrays and beamforming work together to mimic the human auditory system's ability to localize sound. Multiple microphones, or an array, allow devices to simultaneously hear sounds from all directions. Through beamforming, the microphone array can be programmed to selectively capture and reject sounds through location source recognition of incoming sounds through timing, frequency and amplitude cues. Different microphones work together to capture near and far sounds. They then send the information for DSP that helps the system distinguish which audio is important (speech) and which is unwanted (noise).
Designing for convenience and usability
Voice is the ultimate control for devices. VUI use must be as natural as any human conversation while still being incredibly responsive. Product developers have new challenges to create a device that is always on. The expectation of immediate wake and always listening behavior necessitates design to incorporate extremely low energy draw. Conventional VUIs with a smart speaker interface require commands to go through several steps and depend on the speed of the server and home Wi-Fi connection.
For some devices, if latency exceeds more than a few seconds the user command is canceled (much to the frustration of many consumers). Voice systems designed for the edge considerably mitigate issues, increasing both convenience and usability.
Devices that operate via a battery, such as smart door locks, must conserve enough energy to be practical for the end consumer, without the need for frequent battery changes -- all while still listening 24/7 for a wake word. Including a feature like voice activity detection trains a system to recognize wake commands from specific individuals only.
Thoughtful design helps VUI and the technology it supports to meet the ever-expanding customer needs. The explosive growth of voice control, even in today's early stages, is ample proof of market opportunities for voice control in the smart home. Voice control as an embedded feature in the smart home is no longer a surprise -- it has become table stakes to play at the top of the market. As the technology proliferates and improves, it will become the preferred method of interaction between consumers and their devices -- which means that electronics manufacturers must regard voice control as a standard feature of any smart device.
About the author
Mehul Kochar is the senior director of business development, audio solutions at Knowles, a market leader and global provider of advanced micro-acoustic microphones and speakers, audio solutions and high-performance capacitors and RF products. Mehul holds nearly two decades of experience in establishing and leading customer strategy and execution. He excels at driving new technology into diverse user bases.