Issue #116 – Fully Non-autoregressive Neural Machine Translation
04 Feb21
Issue #116 – Fully Non-autoregressive Neural Machine Translation
Author: Dr. Patrik Lambert, Senior Machine Translation Scientist @ Iconic
Introduction
The standard Transformer model is autoregressive (AT), which means that the prediction of each target word is based on the predictions for the previous words. The output is generated from left to right, a process which cannot be parallelised because the prediction probability of a token depends on previous tokens. In the last few years, new approaches have been proposed to predict the output tokens simultaneously to speed-up decoding. These approaches, called non-autoregressive (NAT) are based on the assumption that the prediction probabilities are independent of the previous token. Since this assumption does not hold in general in machine translation, NAT approaches cause a drop in translation