Voice Cloning comes to the Masses
Remember talking into the fan to hear your “robot voice”? Remember how pretty much all computer generated voices tended to sound like that — i mean yeah, they got better, but they weren’t really all that good , y’know? (•). Text-to-speech has seen some amazing leaps over the last few years, and almost all of it can be attributed to Deep Learning. The folks at DeepMind have WaveNet , which changed the game by directly modeling the raw waveforms of an original voice (or music!), and using that to generate speech from the text. You can see — ok hear — it in action in Google Assistant, and pretty marvelous stuff it is indeed! The folks at Baidu Research have been doing their part to, with DeepVoice . Their focus has been to focus on each of the stages in a TTS pipeline, and on swapping in training models for each stages (as compared to the more “holistic”/end-to-end approach of DeepMind). Anyhow, this post isn’t about which is better, it’s...