Issue #135 – Recovering Low-Frequency Words in Non-Autoregressive NMT

17 Jun21

Issue #135 – Recovering Low-Frequency Words in Non-Autoregressive NMT

Author: Dr. Patrik Lambert, Senior Machine Translation Scientist @ Iconic

Introduction

Non-Autoregressive Translation (NAT), in which the target words are generated independently, is raising a lot of interest because of its efficiency. However, the assumption that target words are independent of each other leads to errors which affect translation quality. In this post we take a look at a paper by Ding et al. (2021) which confirms findings that low-frequency words are the most affected, and proposes a training method to boost the translation of such words.

Knowledge Distillation

NAT models generate the target words in parallel instead of one after the other. As a consequence, they cannot capture the dependencies between target words. They have

To finish reading, please visit source site

Non-Autoregressive Translation