Issue #61 – Context-Aware Monolingual Repair for Neural Machine Translation
21 Nov19
Issue #61 – Context-Aware Monolingual Repair for Neural Machine Translation
Author: Dr. Rohit Gupta, Sr. Machine Translation Scientist @ Iconic
In issue #15 and issue #39 we looked at various approaches for document level translation. In this blog post, we will look at another approach proposed by Voita et. al (2019a) to capture context information. This approach is unique in the sense that it utilizes only target monolingual data to improve the discourse phenomenon (deixis, ellipsis, lexical cohesion, ambiguity, anaphora, etc., which often require context to be determined) in machine translation.
Document-level Repair
They proposed a two pass machine translation approach. In the first pass, they obtain a context agnostic translation and in the second pass they feed translations obtained in the first pass through a document level repair (DocRepair) model to correct the contextual errors.
The DocRepair model is the standard sequence-to-sequence Transformer. Sentences in a group are concatenated to form long inconsistent pseudo-sentences. The Transformer is trained to correct these long inconsistent pseudo-sentences into consistent ones.
Training the DocRepair model
The DocRepair model is a monolingual sequence-to-sequence model. It maps inconsistent groups of sentences into consistent ones as mentioned
To finish reading, please visit source site