Eugenio Marongiu - Fotolia

Google releases cloud text-to-speech service for developers

A new Google text-to-speech service generates natural-sounding speech in 32 voices and 12 languages. Developers can integrate the software with most enterprise apps and devices.

Google has released in beta text-to-speech development tools for building interactive voice response bots, adding voices to IoT devices, and improving the workflows of people who prefer listening to reading.

The company made that software available to developers this week through the Google Cloud Platform. Google already embeds speech in its popular consumer apps, such as Google Assistant, Google Maps and Google Search.

"We tend to think about voice in really linear terms. We use our phone for voice, or we talk in person using our voice," said Jon Arnold, principal of Toronto-based research and analysis firm J Arnold & Associates. "This is opening up ways of using voice and speech, and the broader spectrum of audio, in really interesting ways."

The Google text-to-speech service lets users choose from 32 voices and 12 languages, with the ability to customize pitch, speaking rate and volume gain. Developers can embed the software in phones, personal computers, tablets and IoT devices such as televisions and speakers. Cisco is using the service to improve its collaboration platform, Spark.

Google's service relies on the machine learning tools of DeepMind, which Google acquired in 2014, to produce speech that sounds more natural than traditional automated voices. Instead of piecing together short speech fragments, DeepMind's WaveNet builds voices from scratch. Analyzing a trove of human speech data from Google Voice Search, the WaveNet neural network can discern, for example, the shape of natural speech sound waves and the proper order of voice tones.

The text-to-speech service is Google's latest foray into the enterprise market. Earlier this month, the consumer giant released Hangouts Chat, a team collaboration app that will compete with Slack and Microsoft Teams. In the enterprise market for text-to-speech APIs, Google will go head-to-head with Amazon Polly and IBM Watson.

"They want enterprises to see them as an important partner, not just to compete straight with Microsoft for office applications, but with other things too," Arnold said. "And [artificial intelligence] is where they have a really strong position."

Emerging use cases for Google text-to-speech service

Developers could use the Google service to improve interactions with voice-enabled platforms, such as virtual assistants and consumer-facing interactive voice response (IVR) systems, said Irwin Lazar, an analyst at Nemertes Research, based in Mokena, Ill. 

"For example, rather than just asking for the latest sales report, you could ask the voice assistant to read back to you the information, or you could engage in a natural language conversation with an AI assistant," Lazar said.

Contact centers could use the tool to help make interacting with IVRs more appealing. Rather than punching numbers on a touch-tone dial pad or issuing specific voice commands, customers could engage in natural-sounding conversations with a voice bot, Arnold said.

The text-to-speech service could use its artificial intelligence to create audio playlists of memos, presentations and meeting transcripts and automatically order the documents in a way that prioritizes those it deems to be most relevant, Arnold said.

"Leave it to the imagination. People will find applications and use cases. The bigger story is this is what digital disruption looks like," Arnold said. "When a capability like this falls in your lap, people will find use cases for it."

Dig Deeper on Communications platforms and integrations