Manage

AI speech-to-text eavesdropping can serve the greater good

David Essex, Industry Editor

Listen to this podcast

Intelligent Voice CTO Nigel Cannings explains how AI-based speech-to-text technology can catch criminals, protect privacy and ensure regulatory compliance.

Podcast

AI that understands speech has been a boon to consumers looking to make a quick purchase on Amazon or dictate a text message while driving. The technology also can be used to record and analyze phone calls and transmit incriminating parts to bosses, government regulators and nefarious actors. It sounds Orwellian -- the perfect tool for a surveillance state.

But what if AI speech-to-text technology could have caught Lehman Brothers before its risky mortgages almost brought down the financial system in 2008? Or spotted Enron's dishonest accounting and stopped the Libor scandal, a 2012 conspiracy by British banks to manipulate interest rates?

Such idealistic purposes, and more mundane but important ones like credit card privacy and compliance monitoring, are the purview of London-based Intelligent Voice, which makes an AI-based speech-to-text platform geared to privacy and security. Banks, insurers, governments, law firms and healthcare providers are the company's typical customers.

Intelligent Voice uses speech recognition and natural language processing to convert the spoken word -- usually from phone calls and video conferences -- into text, analyze it and send it by email and other channels to customers for review. It uses biometric identification to match voices to the metadata in the transcription.

In this podcast, CTO Nigel Cannings explains Intelligent Voice's technology, current and potential applications, the risks of using AI and ways to alleviate those risks.

Putting technology in service of the law

Cannings left a legal career for the software industry. "I have to airbrush my guilty past now and pretend I've been a technologist my whole life," he joked.

In fact, his love of technology dates to childhood. He credits his father, Bill, a pioneer of U.K. personal computer sales, now chairman of Intelligent Voice, with influencing his career decisions, sometimes with blunt talk.

Nigel Cannings

Cannings got deeply involved in voice technology during the 2008 financial crisis. "I thought we could listen in to this stuff, and so I developed a product which could actually listen in to phone calls, read emails, look at chat messages and start to divine patterns of bad behavior," he said.

Big banks were among his first customers. "We have a lot of household names who don't like us talking about who they are because they didn't like their staff necessarily understanding the level of capability that we've got to monitor them," he explained. "If you're working for a large bank and you're on the phone trading, there's a reasonably good chance that our software is sitting there listening."

Another application tackles compliance with Payment Card Industry security standards. "The idea is to eliminate all traces of your credit card number from any recording," Cannings said. "People want to hang on to data but they don't want sensitive stuff kept in the recording." The software can also identify and remove personally identifiable information, such as names, addresses and Social Security numbers.

Quality assurance in contact centers is another common use case. Intelligent Voice's platform uses sentiment analysis, among other methods, to compare how a customer sounds at the start of a call with how they seem at the end -- if they become angry, for example.

Cannings described Intelligent Voice's research on detecting vulnerability in the voice of a caller to discern whether they understand what's being said. He believes it could help fight elder abuse and other types of fraud and perhaps provide remote family members a way to intervene before it's too late.

This capability could also prove useful to U.K. companies affected by a rule of the Financial Conduct Authority that puts the onus on them to prove a customer understood the products being sold to them.

How large language models change the AI equation

Cannings wasn't shy about discussing the challenges of using AI to process speech in useful and responsible ways. He said speech is a lot harder to deal with than text, partly because of the variations in how people speak.

The technology also carries risk. Speech technology and telephony are "inherently insecure," he said, adding that voice calls over the internet aren't encrypted, a fact most people aren't aware of.

Speech recognition also requires the expensive processing power typically available to the likes of Amazon and Google, which sometimes take shortcuts to train their products. Cannings said Amazon sent recordings of voices picked up by Alexa to human transcribers in Eastern Europe.

Generative AI and its large language models (LLMs) could help customers use Intelligent Voice's technology to extract structured data from unstructured data, Cannings noted, but it needs to be applied judiciously.

Concerning generative AI's habit of making things up and presenting inaccurate information as fact, he said Intelligent Voice uses predictive models to box LLMs in with rules about permissible results. "By the time the answer comes out, it's super structured," he said.

Bias, another risk of AI, has been harder to address because providers of relatively unbiased content, such as the BBC and The New York Times, refuse to make their content available for training LLMs, he said. Websites on the fringes of the internet are more open about access, which effectively encourages people to train their models on biased content.

Other topics discussed include the following:

How future applications of generative AI could differ from predictive AI.
The potential and limits of using generative AI to make ERP UIs more user-friendly.
Intelligent Voice's research into using neural networks for LLMs and end-to-end encryption for data transmitted to the cloud.

To hear the podcast, click on the link above.