ESPnet: end-to-end speech processing toolkit
![](https://www.deeplearningdaily.com/wp-content/uploads/2021/08/espnet-end-to-end-speech-processing-toolkit_610c6166dec42-375x210.jpeg)
ESPnet
ESPnet is an end-to-end speech processing toolkit, mainly focuses on end-to-end speech recognition and end-to-end text-to-speech. ESPnet uses chainer and pytorch as a main deep learning engine, and also follows Kaldi style data processing, feature extraction/format, and recipes to provide a complete setup for speech recognition and other speech processing experiments.
Key Features
Kaldi style complete recipe
- Support numbers of
ASR
recipes (WSJ, Switchboard, CHiME-4/5, Librispeech, TED, CSJ, AMI, HKUST, Voxforge, REVERB, etc.) - Support numbers of
TTS
recipes with a similar manner to the ASR recipe (LJSpeech, LibriTTS, M-AILABS, etc.) - Support numbers of
ST
recipes (Fisher-CallHome Spanish, Libri-trans, IWSLT’18, How2, Must-C, Mboshi-French, etc.) - Support numbers of
MT
recipes (IWSLT’16, the above ST recipes etc.) - Support speech separation and recognition recipe (WSJ-2mix)
- Support voice