Issue #97 – Target Conditioned Sampling: Optimising Data Selection for Multilingual Neural Machine Translation

03 Sep20

Issue #97 – Target Conditioned Sampling: Optimising Data Selection for Multilingual Neural Machine Translation

Author: Dr. Chao-Hong Liu, Machine Translation Scientist @ Iconic

Introduction

It is known that neural machine translation (NMT) is particularly tricky in the case of low-resource languages. Thus, it is not surprising that researchers are actively investigating how to improve the performance on NMT systems for low-resource languages and many approaches are currently being explored. In issue #88 of our blog we reviewed a method to use pre-trained models, i.e. auto-encoders trained to re-construct texts from deliberately corrupted texts, for multilingual NMT. We also reviewed an unsupervised parallel sentence extraction method for NMT in issue #94. Another approach that has shown good results in the past is to take advantage of closely related languages. In this way, a low-resource language pair can benefit from high-resource language pairs Neubig and Hu (2018). In this post, we review a method proposed by Wang and Neubig (2019), to further improve the performance by selecting proper data “from other auxiliary languages” for NMT training.

Target Conditioned Sampling

Wang and Neubig (2019) expands the work

To finish reading, please visit source site