voyager624 - Fotolia

Google Cloud speech-to-text service gets revamp

Google Cloud speech-to-text service has been updated for improved accuracy through machine learning. The company is also releasing a tool for automatically adding punctuation to transcripts.

Google Cloud speech-to-text service has been updated with modules designed explicitly for transcribing the audio of phone calls and videos. Developers will be able to embed those services into call center software or web conferencing platforms.

Google also revealed this month that it had used data voluntarily shared by customers to improve the accuracy of transcriptions significantly. The company reduced word errors by more than half using machine learning tools.

Customers can reap the rewards of the more-advanced service by agreeing to share usage data with Google moving forward, while privacy-minded clients will be able to opt out of the program.

In addition to phone and video transcription, the Google Cloud Speech API, released in 2016, includes a default program for transcribing long audio files and supports voice searches and commands.

Google is making available this month a beta version of the automatic punctuation tool that the company has been using internally for the past few years to improve voicemail transcripts. The platform automatically inserts periods, commas and question marks into transcribed speech.

Google Cloud's speech-to-text service revamp comes less than one week after Amazon Web Services announced it was making its transcription platform, Amazon Transcribe, generally available to developers.

Businesses can now customize the vocabulary of Amazon Transcribe to include business-specific acronyms and keywords. AWS also updated the service to be able to distinguish between multiple speakers in an audio file.

The platforms from Amazon and Google compete with similar services from IBM Watson and Microsoft Azure.

"There is a massive race to make speech-to-text more widely used," said Steve Vonder Haar, senior analyst at Wainhouse Research, based in Duxbury, Mass. "It's a critical piece of what all these vendors are doing right now."

Use cases for speech-to-text services still evolving

In the short term, enterprises could use a speech-to-text service to convert meeting and webinar recordings into searchable text archives, Vonder Haar said. For example, a worker might ask a virtual assistant about something a colleague mentioned in a meeting three months ago.

Microsoft recently announced that it would soon add automatic meeting transcription to its team collaboration platform, Microsoft Teams. Web conferencing vendors Zoom and BlueJeans introduced similar features this year.

In the future, enterprises will be able to feed automatically generated transcripts of business conversations into virtual assistants like IBM Watson or Google Assistant, helping those machines learn how to assist workers or customers better.

"If you have your VP of marketing provide an overview of what a particular product does, that video is captured, that audio is converted into text, that text becomes searchable, and, ultimately, that text can be fed into machine intelligence systems," Vonder Haar said.

Vendors are continually improving their speech-to-text tools, but enterprises shouldn't wait until those platforms are perfect before experimenting with them, said Jon Arnold, principal of Toronto-based research and analysis firm J Arnold & Associates.

"To me, the big takeaway is these platforms definitely provide a lot of exciting possibilities," Arnold said. "Do some harmless in-house trials, get a feel for it, because the use cases will come out of the woodwork once you start getting comfortable with it."

Next Steps

Automatic speech recognition may be better than you think

Dig Deeper on UC strategy