Machine Translation Weekly 70: Loss Masking instead of Data Filtering

This week, I will have a closer look at a recent pre-print introducing an
alternative for parallel data filtering for machine translation training. The
title of the pre-print is Gradient-guided Loss Masking for Neural Machine
Translation
and comes from CMU and Google.

Training data cleanness is a surprisingly important factor for machine
translation quality. A large part of the data that we use for training comes
from crawling the Internet, so there is no quality guarantee. On the other
hand, the tools for crawling parallel data are pretty good, so the sentence
pairs that we get are always at least partial translations of each other. One
would expect that the more data the better and that few bad sentence pairs get
lost in the majority of the good parallel

 

 

To finish reading, please visit source site

Leave a Reply