Issue #135 – Recovering Low-Frequency Words in Non-Autoregressive NMT
17 Jun21
Issue #135 – Recovering Low-Frequency Words in Non-Autoregressive NMT
Author: Dr. Patrik Lambert, Senior Machine Translation Scientist @ Iconic
Introduction
Non-Autoregressive Translation (NAT), in which the target words are generated independently, is raising a lot of interest because of its efficiency. However, the assumption that target words are independent of each other leads to errors which affect translation quality. In this post we take a look at a paper by Ding et al. (2021) which confirms findings that low-frequency words are the most affected, and proposes a training method to boost the translation of such words.
Knowledge Distillation
NAT models generate the target words in parallel instead of one after the other. As a consequence, they cannot capture the dependencies between target words. They have