Nix-TTS: An Incredibly Lightweight End-to-End Text-to-Speech Model via Non End-to-End Distillation
An Incredibly Lightweight End-to-End Text-to-Speech Model via Non End-to-End Distillation
Rendi Chevi, Radityo Eko Prasojo, Alham Fikri Aji
This is a repository for our paper, 🐤 Nix-TTS (Submitted to INTERSPEECH 2022). We released the pretrained models, an interactive demo, and audio samples below.
[📄 Paper Link] [🤗 Interactive Demo] [📢 Audio Samples]
Abstract We propose Nix-TTS, a lightweight neural TTS (Text-to-Speech) model achieved by applying knowledge distillation to a powerful yet large-sized generative TTS teacher model. Distilling a TTS model might sound unintuitive due to the generative and disjointed nature of TTS architectures, but pre-trained TTS models can be simplified into encoder and decoder structures, where the former encodes text into some latent representation and the latter decodes the latent into speech data.