Machine Translation Weekly 57: Document-level MT with Context Masking

This week, I am going to discuss the paper “Long-Short Term Masking
Transformer: A Simple but Effective Baseline for Document-level Neural Machine
Translation
” by authors from Alibaba Group.
The preprint of the paper appeared a month ago on arXiv and will be presented
at this year’s EMNLP.

Including document-level context into machine translation is one of the biggest
challenges of current machine translation. It has several reasons. One is the
lack of document-level training data, which is partially caused by copyright
law. If we want to have replicable results, the training dataset must be
public, and most of the texts are under copyright. You cannot be blamed for
sharing copyrighted content if it is sentence-split and thoroughly shuffled,
such that the original text cannot be reconstructed.

The other reasons

 

 

To finish reading, please visit source site

Leave a Reply