Posts

Showing posts with the label TTS

Voice Cloning comes to the Masses

Image
Remember talking into the fan to hear your “robot voice”? Remember how pretty much all computer generated voices tended to sound like that — i mean yeah, they got better, but they weren’t really all that  good , y’know? (•). Text-to-speech has seen some amazing leaps over the last few years, and almost all of it can be attributed to Deep Learning. The folks at  DeepMind  have  WaveNet , which changed the game by directly modeling the raw waveforms of an original voice (or music!), and using that to generate speech from the text. You can see — ok hear — it in action in Google Assistant, and pretty marvelous stuff it is indeed! The folks at  Baidu Research  have been doing their part to, with  DeepVoice . Their focus has been to focus on each of the stages in a TTS pipeline, and on swapping in training models for each stages (as compared to the more “holistic”/end-to-end approach of DeepMind). Anyhow, this post isn’t about which is better, it’s...