HOVI: Creates Text-to-Speech messages from people’s face movement to help people who have difficulty communicating with one’s voice

Team Member: Kang Inyeong, Kim Yeonghyeon, Lee Seulbi, Park Jisoo from GDSC SeoulTech (2021.12.21-ing) šŸŒ± Index What is HOVI? What is HOVIā€™s SDGs? Who can be a HOVIā€™s user? Used Technology/Diagram How to use? Demo Video What is HOVIā€™s Vision? Who develop HOVI? šŸŒ± What is HOVI? HOVI means ā€˜Have Own Voice Intermidiatorā€™. The application creates Text-to-Speech messages from peopleā€™s face movement to help people who have difficulty communicating with oneā€™s voice. HOVI is used as follows : 1ļøāƒ£ Use […]

Read more

EfficientTTS: An Efficient and High-Quality Text-to-Speech Architecture

EfficientTTS Unofficial Pytorch implementation of ā€œEfficientTTS: An Efficient and High-Quality Text-to-Speech Architectureā€(arXiv). Disclaimer: Somebody mistakenly think Iā€™m one of the authors. In fact, I am not even in the author list of this paper. I am just a TTS enthusiast. Some important information of the implementation is not presented by the paper. Some model parameters in current version is based on my understanding and exepriments, which may not be consistent with those used by the authors. Updates 2020/12/23: Mandarin Chinese […]

Read more

Towards Fast, Controllable and Lightweight Text-to-Speech synthesis

FCL-Taco2 Block diagram of FCL-taco2, where the decoder generates mel-spectrograms in AR mode within each phoneme and is shared for all phonemes. Training and inference scripts for FCL-taco2 Environment python 3.6.10 torch 1.3.1 chainer 6.0.0 espnet 8.0.0 apex 0.1 numpy 1.19.1 kaldiio 2.15.1 librosa 0.8.0 Training and inference: Step1. Data preparation & preprocessing Download LJSpeech Unpack downloaded LJSpeech-1.1.tar.bz2 to /xx/LJSpeech-1.1 Obtain the forced alignment information by using Montreal forced aligner tool. Or you can download our alignment results, then unpack […]

Read more

Attention Based Grapheme To Phoneme with python

G2P The G2P algorithm is used to generate the most probable pronunciation for a word not contained in the lexicon dictionary. It could be used as a preprocess of text-to-speech system to generate pronunciation for OOV words. Dependencies The following libraries are used:pytorchtqdmmatplotlib Install dependencies using pip: pip3 install -r requirements.txt Dataset Currently the following languages are supported: EN: English FA: Farsi RU: Russian You could easily provide and use your own language specific pronunciatin doctionary for training G2P. More […]

Read more