Issue #114 – Tagged Back-translation Revisited
21 Jan21
Issue #114 – Tagged Back-translation Revisited
Author: Dr. Karin Sim, Machine Translation Scientist @ Iconic
Introduction
In a previous post in our series, we examined tagged back-translation for Neural Machine Translation (NMT), whereby the back-translated data that is used to supplement parallel data is tagged before training. This led to improvements in the output over untagged data.
Today’s blog post extends the work of Caswell et al. (2019), by taking a closer look at why and how adding this unique token to the beginning of the back-translated segments helps the system differentiate between it and original source text segments in the parallel data. In particular, in the research which we are highlighting today, Marie et al. (2020) consider