EfficientTTS: An Efficient and High-Quality Text-to-Speech Architecture

EfficientTTS

Unofficial Pytorch implementation of “EfficientTTS: An Efficient and High-Quality Text-to-Speech Architecture”(arXiv).

Disclaimer: Somebody mistakenly think I’m one of the authors. In fact, I am not even in the author list of this paper. I am just a TTS enthusiast. Some important information of the implementation is not presented by the paper. Some model parameters in current version is based on my understanding and exepriments, which may not be consistent with those used by the authors.

Updates

2020/12/23: Mandarin Chinese Samples uploaded. The experiment setting is exactly the same with the LJSpeech example. A complete description of the usage will be soon uploaded.

2020/12/20: Using the HifiGAN finetuned with Tacotron2 GTA mel spectrograms can increase the quality of the generated samples, please see the newly generated-samples

Current status

Implementation of EFTS-CNN + HifiGAN

Setup

To finish reading, please visit source site