Issue #43 – Improving Overcorrection Recovery in Neural MT
27 Jun19
Issue #43 – Improving Overcorrection Recovery in Neural MT
Author: Raj Patel, Machine Translation Scientist @ Iconic
In Neural MT, at training time, the model predicts the current word with the ground truth word (previous word in the sequence) as a context, while at inference time it has to generate the complete sequence. This discrepancy in training and inference often leads to an accumulation of errors in the translation process, resulting in out-of-context translations. In this post we’ll discuss a training method proposed by Zhang et al. (2019) to bridge this gap between training and inference.
Data As Demonstrator (DAD)
The above discrepancy in the training and inference of Neural MT is referred to as exposure bias (Ranzato et al., 2016). As the target sequence grows, the errors accumulate along the sequence and the model has to predict under conditions it has not met at training time. Intuitively, to address this problem, the model should be trained to predict under the same conditions it will face at inference. Analogous to the Data As Demonstrator (Venkatraman et al.,2015) algorithm, Zhang et al. (2019) proposed a
To finish reading, please visit source site