Making automatic speech recognition work on large files with Wav2Vec2 in 🤗 Transformers
Tl;dr: This post explains how to use the specificities of the Connectionist Temporal Classification (CTC) architecture in order to achieve very good quality automatic speech recognition (ASR) even on arbitrarily long files or during live inference. Wav2Vec2 is a popular pre-trained model for speech recognition. Released in September 2020 by Meta AI Research, the novel architecture catalyzed progress in
Read more