Amazon Polly

Converts text in “life-like” speech
The products takes text in specific languages and outputs speech in that specific language. Polly does not do translation!
There 2 modes that Polly operates in:
- Standard TTS:
  - Uses a concatenative architecture
  - Takes phonemes (smallest units of sound) to build patterns of speech
- Neural TTS:
  - Takes phonemes, generate spectograms, it puts those spectograms through a vocoder form which gets the output audio
  - Much advanced way of generating human-like speech
Output formats: MP3, Ogg Vorbis, PCM
Polly is capable of using the Speech Synthesis Markup Language (SSML). This is a way we can provide additional control over how Polly generates speech. We can get Polly to emphasis certain part of the text or do certain pronunciation (whispering, Newscaster speaking style)