Issue #24 – Exploring language models for Neural MT
07 Feb19
Issue #24 – Exploring language models for Neural MT
Author: Dr. Patrik Lambert, Machine Translation Scientist @ Iconic
Monolingual language models were a critical part of Phrase-based Statistical Machine Translation systems. They are also used in unsupervised Neural MT systems (unsupervised means that no parallel data is available to supervise training, in other words only monolingual data is used). However, they are not used in standard supervised Neural MT engines and training language models have disappeared from common NMT practice. Two recent papers suggest that language models may soon be back in supervised MT. Devlin et al. (2018) improve several natural language processing tasks in English by using pre-trained bidirectional language models. Conneau and Lample (2019) extend this approach to multilingual language models and improve unsupervised and supervised MT. In this post, we will take a look at the approach of Devlin et al.
BERT (Bidirectional Encoder Representations from Transformers)
Standard language models read the text input sequentially (left-to-right or right-to-left, or both combined) and predict a word given the previous words in the sequence. For example in a left-to-right model, the probability of the word following “I drive my” depends
To finish reading, please visit source site