Issue #5 – Creating training data for Neural MT
15 Aug18
Issue #5 – Creating training data for Neural MT
Author: Prof. Andy Way, Deputy Director, ADAPT Research Centre
This week, we have a guest post from Prof. Andy Way of the ADAPT Research Centre in Dublin. Andy leads a world-class team of researchers at ADAPT who are working at the very forefront of Neural MT. The post expands on the topic of training data – originally presented as one of the “6 Challenges in NMT” from Issue #4 – and considers the possibility of creating synthetic training data using backtranslation. Enjoy!
Do you understand why you’re using backtranslated data?
Even if you ignore the hype surrounding recent claims by Google, Microsoft and SDL that their neural machine translation (NMT) engines are “bridging the gap between human and machine translation”, or have “achieved human parity” or “cracked Russian-to-English translation”, respectively, there is little doubt that NMT has rapidly overtaken statistical MT (SMT) as the new state-of-the-art in the field of machine translation (cf. Bentivogli et al., 2016).
However, as covered in Issue #4 of this very series, it is widely acknowledged that NMT typically requires much more data to build a system
To finish reading, please visit source site