Issue #67 – Unsupervised Adaptation of Neural MT with Iterative Back-Translation
30 Jan20
Issue #67 – Unsupervised Adaptation of Neural MT with Iterative Back-Translation
Author: Dr. Patrik Lambert, Machine Translation Scientist @ Iconic
The most popular domain adaptation approach, when some in-domain data are available, is to fine-tune the training of the generic model with the in-domain corpus. When no parallel in-domain data are available, the most popular approach is back-translation, which consists of translating monolingual target in-domain data into the source language and use it as training corpus. In this post we have a look at a refinement of back-translation, inspired from the advances in unsupervised neural MT, which yields large BLEU score improvements.
Adaptation with Iterative Back-Translation
The method is presented in a paper by Jin et al. (2020). It assumes access to an out-of-domain parallel training corpus and in-domain monolingual data (in both the source and the target languages). In this approach the training optimises three objectives:
- Source and target bidirectional language models. In these language models, masked words are predicted given the whole context surrounding them.
- Source-to-target and target-to-source unsupervised translation models. Source monolingual sentences are translated by the current source-to-target model. Similarly, target monolingual sentences are translated by the current target-to-source model.
To finish reading, please visit source site