Issue #88 – Multilingual Denoising Pre-training for Neural Machine Translation
02 Jul20
Issue #88 – Multilingual Denoising Pre-training for Neural Machine Translation
Author: Dr. Chao-Hong Liu, Machine Translation Scientist @ Iconic
Introduction
Pre-training has been used in many natural language processing (NLP) tasks with significant improvements in performance. In neural machine translation (NMT), pre-training is mostly applied to building blocks of the whole system, e.g. encoder or decoder. In a previous post (#70), we compared several approaches using pre-training with masked language models. In this post, we take a closer look at the method proposed by Liu et al. (2020), to pre-train a sequence-to-sequence denoising auto-encoder, referred to as mBART, from mono-lingual corpora across languages.
Multilingual Denoising Pre-training
BART, proposed by Lewis et al. (2020), is a denoising sequence-to-sequence pre-training method for NLP tasks. The idea is simple: we corrupt the texts with a “noisying function” and then train a (denoising) auto-encoder that could reconstruct the original texts. mBART uses BART to train the auto-encoder on “large-scale mono-lingual corpora across many languages.” In the experiments, the noisying function corrupts the texts by masking phrases and permuting sentences, Liu et al. (2020). Compared to Masked Sequence to Sequence
To finish reading, please visit source site