April 4, 2022 Text-to-Speech

HOVI: Creates Text-to-Speech messages from people’s face movement to help people who have difficulty communicating with one’s voice

Team Member: Kang Inyeong, Kim Yeonghyeon, Lee Seulbi, Park Jisoo from GDSC SeoulTech (2021.12.21-ing) 🌱 Index What is HOVI? What is HOVI’s SDGs? Who can be a HOVI’s user? Used Technology/Diagram How to use? Demo Video What is HOVI’s Vision? Who develop HOVI? 🌱 What is HOVI? HOVI means ‘Have Own Voice Intermidiator’. The application creates Text-to-Speech messages from people’s face movement to help people who have difficulty communicating with one’s voice. HOVI is used as follows : 1️⃣ Use […]

September 9, 2021 Text-to-Speech

EfficientTTS: An Efficient and High-Quality Text-to-Speech Architecture

EfficientTTS Unofficial Pytorch implementation of “EfficientTTS: An Efficient and High-Quality Text-to-Speech Architecture”(arXiv). Disclaimer: Somebody mistakenly think I’m one of the authors. In fact, I am not even in the author list of this paper. I am just a TTS enthusiast. Some important information of the implementation is not presented by the paper. Some model parameters in current version is based on my understanding and exepriments, which may not be consistent with those used by the authors. Updates 2020/12/23: Mandarin Chinese […]

July 6, 2021 Text-to-Speech

Towards Fast, Controllable and Lightweight Text-to-Speech synthesis

FCL-Taco2 Block diagram of FCL-taco2, where the decoder generates mel-spectrograms in AR mode within each phoneme and is shared for all phonemes. Training and inference scripts for FCL-taco2 Environment python 3.6.10 torch 1.3.1 chainer 6.0.0 espnet 8.0.0 apex 0.1 numpy 1.19.1 kaldiio 2.15.1 librosa 0.8.0 Training and inference: Step1. Data preparation & preprocessing Download LJSpeech Unpack downloaded LJSpeech-1.1.tar.bz2 to /xx/LJSpeech-1.1 Obtain the forced alignment information by using Montreal forced aligner tool. Or you can download our alignment results, then unpack […]

June 26, 2021 Text-to-Speech

Attention Based Grapheme To Phoneme with python

G2P The G2P algorithm is used to generate the most probable pronunciation for a word not contained in the lexicon dictionary. It could be used as a preprocess of text-to-speech system to generate pronunciation for OOV words. Dependencies The following libraries are used:pytorchtqdmmatplotlib Install dependencies using pip: pip3 install -r requirements.txt Dataset Currently the following languages are supported: EN: English FA: Farsi RU: Russian You could easily provide and use your own language specific pronunciatin doctionary for training G2P. More […]