Issue #11 – Unsupervised Neural MT
27 Sep18
Issue #11 – Unsupervised Neural MT
Author: Dr. Rohit Gupta, Sr. Machine Translation Scientist @ Iconic
In this week’s article, we will explore unsupervised machine translation. In other words, training a machine translation engine without using any parallel data! As you might imagine, the potential implications of not needing any data to train a Neural MT engine could be huge.
In general, most of the approaches in this direction still use some bilingual signal, for example using parallel data in related languages; pivoting; using a small parallel corpus; or a bilingual dictionary. When there is no directly parallel data to use, results are typically much worse compared to supervised methods. However, here we take a look at the technique proposed by Lample et al. 2018, which recently won the best paper award at the prestigious EMNLP 2018 Conference. This approach uses only monolingual data in both languages and still obtains a decent MT system. The performance is better than training a neural MT system with 100,000 parallel sentences.
Cross lingual word-embeddings
When there is no parallel data available, the first step in such scenarios is to get cross lingual word-embeddings. Such embeddings are usually
To finish reading, please visit source site