Dynamic Encoder Transducer: A Flexible Solution For Trading Off Accuracy For Latency
Abstract
We propose a dynamic encoder transducer (DET) for on-device speech recognition. One DET model scales to multiple devices with different computation capacities without retraining or fine-tuning. To trading off accuracy and latency, DET assigns different encoders to decode different parts of an utterance. We apply and compare the layer dropout and the collaborative learning