Issue #61 – Context-Aware Monolingual Repair for Neural Machine Translation

21 Nov19

Issue #61 – Context-Aware Monolingual Repair for Neural Machine Translation

Author: Dr. Rohit Gupta, Sr. Machine Translation Scientist @ Iconic

In issue #15 and issue #39 we looked at various approaches for document level translation. In this blog post, we will look at another approach proposed by Voita et. al (2019a) to capture context information. This approach is unique in the sense that it utilizes only target monolingual data to improve the discourse phenomenon  (deixis, ellipsis, lexical cohesion, ambiguity, anaphora, etc., which often require context to be determined) in machine translation. 

Document-level Repair

They proposed a two pass machine translation approach. In the first pass, they obtain a context agnostic translation and in the second pass they feed translations obtained in the first pass through a document level repair (DocRepair) model to correct the contextual errors.

The DocRepair model is the standard sequence-to-sequence Transformer. Sentences in a group are concatenated to form long inconsistent pseudo-sentences. The Transformer is trained to correct these long inconsistent pseudo-sentences into consistent ones. 

Training the DocRepair model

The DocRepair model is a monolingual sequence-to-sequence model. It maps inconsistent groups of sentences into consistent ones as mentioned
To finish reading, please visit source site

Leave a Reply