Getty Images


3 use cases highlight AI and speech recognition evolution

The combination of AI and speech recognition has driven innovation for more advanced capabilities. Explore three applications that highlight the future of speech technology.

Speech-related technologies have been mature for some time, but the advent of AI has ushered in dramatic innovation with entirely new applications and use cases. The forced shift to home-based work at the onset of the COVID-19 pandemic brought new challenges for both communicating and collaborating, further accelerating the development of speech technology.

Unified communications (UC) vendors have responded by adding AI-driven features, like real-time transcription and translation. These new features have been widely adopted, and workers have a greater understanding for how speech technology can enhance their productivity. While this is a welcome development, we're still in the early stages for what's possible with AI and speech recognition technology.

With AI being an iterative form of technology, we should expect constant innovation and new capabilities -- not just steady improvements. Let's examine three examples of what's ahead for speech technology in the enterprise.

1. Virtual assistants

While the concept of virtual assistants isn't new, it's evolved to a level where every worker can now have the equivalent of a personal secretary at minimal cost to the organization. Until recently, the capabilities of virtual assistants were based on responding to simple commands or structured, closed-ended questions, much like with Amazon Alexa or Apple's Siri.

These voice assistants provided a speech-based alternative to using a keypad but lacked intelligence to do much beyond simple tasks, like making a call. The big change comes with conversational AI, where chatbots have enough intelligence to engage in two-way dialogue with workers and even initiate a conversation.

With today's advances in AI, virtual assistants can be fully integrated with workplace applications, such as calendars, email, telephony, conferencing, workflow tools, documents and spreadsheets. Interactions with voice assistants can now be more conversational for calendar planning and meeting scheduling. With speech recognition being highly accurate now, digital assistants can be used to dictate memos, convert emails to speech or even convert voicemail messages to text. Taken together, all of these capabilities take the value of digital assistants to a much higher level.

2. Intelligent meeting summaries

This is a great example of how AI adds new business value beyond the basic task of accurately transcribing what's been said. Speech recognition is rather straightforward when only tracking one person, but meetings require an added layer of speaker recognition to track what each person is saying.

AI plays a key role for providing accurate tracking for both speech and speaker recognition, which creates new possibilities to make collaboration more effective. When all the conversations during a meeting can be captured and properly attributed, analytics can be applied to convert the conversation data streams into actionable knowledge.

For example, meeting transcriptions for team members who could not attend the meeting but need to know what transpired, especially for compliance purposes. While this does have inherent value, AI adds another layer by being able to extract only the conversations relevant to a specific need, such as a particular project, person or time period.

Being data streams, these conversations are also searchable. By using keywords and tags, workers can use AI to create customized meeting summaries. This not only saves workers the time spent listening to an entire replay, but enables team members who didn't attend the meeting to stay fully engaged. As AI capabilities evolve, these meeting summaries will also include action items, not just from those explicitly stated, but also those that are implied from the conversation.

3. Biometrics

This is an emerging field of speech technology, where voice biometrics can be used for authentication, like a retinal scan or a fingerprint. AI can now create highly accurate voiceprints that serve as a touchless option to automate activities such as entering a room, starting a meeting or initiating a financial transaction. These may seem like small things, but they save time and streamline processes, as well as adding another layer of safety to make workers comfortable coming back to the office.

As voice biometrics evolve and become more widely used, they will also provide privacy and security benefits. Some contact centers, for example, are using voice biometrics to mitigate fraud from bad actors trying to impersonate real customers over the phone. Similar scenarios are possible in the workplace where outsiders try to use voice to steal IDs, access company data, divert funds or disrupt operations. Phone systems are often a weak link in a company's security perimeter, and once breached, these malicious actions can quickly follow.

Next Steps

James Earl Jones, AI and the growing voice cloning market

ElevenLabs and the risks of voice-generating AI

AI communications features key when assessing UC vendors

Dig Deeper on UC strategy