Definition

Amazon Polly

Amazon Polly is a text-to-speech service within the Amazon Web Services cloud platform. It uses deep learning technology to allow applications to speak with a human-like voice.

Amazon Polly is primarily for software developers, who use the service to speech-enable their applications. To do this, a developer inputs the text he or she wishes to convert to speech -- either as plain text or in Speech Synthesis Markup Language (SSML) -- into the Amazon Polly application programming interface (API). The service then provides an audio stream for the developer to either play or store as an audio file. It supports audio stream formats, including MP3, Vorbis and PCM.

Amazon Polly supports male and female voices in different languages and dialects. A developer can use SSML to modify vocal pitch, word pronunciation, speed and volume. It is also possible to sync the speech with graphics or animation within an application.

The service does not, however, offer translation between languages.


Follow this quick tutorial to
get started with Amazon Polly.

A business might use Amazon Polly to enable speech in gaming applications, e-learning applications for the visually impaired and internet of things (IoT) devices.

Polly is a part of the Amazon AI suite. Other Amazon AI services include Lex, a service that enables a developer to build conversational user interfaces and Rekognition, an image analysis service.

Amazon uses a pay-per-use pricing model for Polly based on the number of characters converted from text to audio. A developer can cache the audio and replay it infinitely for no additional charge.

This was last updated in November 2017

Continue Reading About Amazon Polly

Dig Deeper on AWS artificial intelligence

App Architecture
Cloud Computing
Software Quality
ITOperations
Close