Getty Images/iStockphoto

AI speech technology offers enterprises benefits, risks

AI offers intriguing prospects to companies looking for ways to get more value from their speech recognition systems. But be wary of expecting too much, too fast.

Speech recognition technology has always been core to the enterprise communications landscape. Speech technology is fairly mature and provides a high level of utility, particularly for telephony and meeting rooms. But, with the advent of AI, new applications for speech are emerging that need to be considered in a different light.

The accuracy of speech to text and transcription is important. But the underlying legacy technology can only improve incrementally, so there is no transformative capability here. AI-driven speech recognition technology, on the other hand, offers innovation that drives new business value, largely because these capabilities can address different challenges.

Applying AI to voice recognition brings scale and speed that go well beyond what legacy speech technologies can manage. Rather than transcription, where the objective is accurate capture of speech t text, today's AI can infer understanding and intent from what is being said. That gives rise to new applications.

Let's examine some key benefits and challenges associated with AI speech technology.

AI speech technology benefits

1. New value from transcription

Conventional speech-to-text tools are labor-intensive and were never intended to capture every conversation. At face value, the benefit of AI-driven speech recognition is transcription that's better, faster and cheaper, making it more cost-effective for the enterprise. There is also a higher-level payoff to consider in the form of doing speech to text on a much greater scale.

AI's transcription accuracy is one part of the equation, but even greater value comes from new data streams when speech is converted to text. The more data fed into an AI engine, the more value it provides as it applies tools such as machine learning to further improve accuracy and identify patterns to drive better business decisions.

2. Workflow automation

As voice recognition accuracy keeps improving, workers are becoming more comfortable using speech as the interface for AI-based applications to automate workflows. Instead of manually going through multiple steps to schedule a meeting or share results of a report with the team, voice can be used to direct chatbots to automate these tasks and processes. These capabilities, known as digital personal assistants, are just emerging. They enable workers to have their own bots that can understand speech-based commands and queries.

Generative AI offers additional automation opportunities. Although the technology's capabilities aren't yet mature, generative AI will let workers use speech or text to ask a bot to compose an email or letter in their own voices, saving additional time and effort. Early results are promising as credible responses can be generated with only minimal input from humans.

3. Touchless interaction

This is another form of automation, and it stems from the COVID-19 pandemic era when physical distancing and touchless interaction were the norm. Those concerns have now abated, but there are many use cases where voice is a better medium than touch. These use cases have less to do with speech recognition and more to do with voice and speaker recognition. While most speech recognition technologies focus on communications, these touchless applications target authentication. Voice biometrics, for example, could be used to control who could gain access to restricted areas. Voice prompts could also be used to start and manage a meeting or to conduct financial transactions.

Keep in mind: AI is iterative. Performance improves the more you use it and the more data sets it has to work with. It's, therefore, unreasonable to expect AI to have near-perfect speech accuracy right from the start.

AI speech technology risks

1. Issues with speech accuracy

Even as AI underpins the innovation reshaping speech recognition technology, it's important to remember it isn't 100% accurate. But, then, neither are humans. The risk here is that AI applications have fairly basic out-of-the-box capabilities. As a result, their initial performance for accuracy will likely be below expectations.

Keep in mind: AI is iterative. Performance improves the more you use it and the more data sets it has to work with. It's, therefore, unreasonable to expect AI to have near-perfect speech accuracy right from the start. Once mistakes are detected and fixed, however, they never repeat. The risks around speech accuracy only diminish over time.

2. Issues with content accuracy

This is a different type of accuracy, and it speaks to a more challenging form of risk. When going beyond basic speech-to-text applications, AI tools are used to identify patterns that humans cannot see and to generate content and summaries. To be effective here, AI tools must understand the more complex nuances of language, such as context and intent.

AI can only work from inputs that humans provide, so it doesn't "know" how to parse out meaning, especially in ambiguous situations. One such result is hallucination, where the output has good grammar and syntax but is factually incorrect or nonsensical. To mitigate this risk, humans must be involved at various points in the process. This manual intervention could quickly defeat the purpose of using AI.

3. Trust and user adoption

There are many reasons why people don't yet trust AI, and those feelings will only intensify if the results are poor. AI is supposed to make things better, not worse. When it comes to speech, there is little margin for error. Even when using sophisticated tools, AI-based speech to text or chatbots can come across as stilted, impersonal and robotic. If AI tries too hard to emulate human emotions, it comes across as contrived.

Humans can quickly detect these eccentricities, at which point they will lose trust in AI. Without trust, they will not adopt these tools, potentially calling into question IT's decision to deploy AI. The risk is expecting too much from the intersection of AI with speech. Once trust is broken, it's hard to gain back. Instead, trust should be viewed as a core building block for AI, and building trust should be a leading success metric when deploying new AI speech technology in your organization.

Jon Arnold is principal of J Arnold & Associates, an independent analyst providing thought leadership and go-to-market counsel with a focus on the business-level effect of communications technology on digital transformation.

Next Steps

James Earl Jones, AI and the growing voice cloning market

Dig Deeper on UC strategy