EfficientTTS: An Efficient and High-Quality Text-to-Speech Architecture
EfficientTTS
Unofficial Pytorch implementation of “EfficientTTS: An Efficient and High-Quality Text-to-Speech Architecture”(arXiv).
Disclaimer: Somebody mistakenly think I’m one of the authors. In fact, I am not even in the author list of this paper. I am just a TTS enthusiast. Some important information of the implementation is not presented by the paper. Some model parameters in current version is based on my understanding and exepriments, which may not be consistent with those used by the authors.
Updates
2020/12/23: Mandarin Chinese Samples uploaded. The experiment setting is exactly the same with the LJSpeech example. A complete description of the usage will be soon uploaded.
2020/12/20: Using the HifiGAN finetuned with Tacotron2 GTA mel spectrograms can increase the quality of the generated samples, please see the newly generated-samples
Current status
- Implementation of EFTS-CNN + HifiGAN
Setup