Machine Translation Weekly 48: MARGE
This week, I will comment on a recent pre-print by Facebook AI titled
Pre-training via Paraphrasing. The
paper introduces a model called MARGE (indeed, they want to say it belongs to
the same family as BART by Facebook) that uses a clever way of denoising as a
training objective for the representation.
Most of the currently used pre-trained models are based on some de-noising. We
sample some noise in the input and want the model to get rid of it in the
output. The implicit assumption behind this process is that we believe that the
model needs to learn something about the language to be able to clean out the
noise. In models like BERT or BART, the noise is sort of trivial: we just hide
some parts of the input and want the model to guess what was the part that was
hidden.
MARGE does much more sophisticated