Machine Translation Weekly 54: Nearest Neighbor MT
This week, I will discuss Nearest Neighbor Machine
Translation, a paper from this year
ICML that takes advantage of overlooked representation
learning capabilities of machine translation models.
This paper’s idea is pretty simple and is basically the same as in the previous
work on nearest neighbor language
models. The paper implicitly argues (or
at least I think it does) that the final softmax layer of the MT models is too
simplifying and thus pose a sort of information bottleneck, even though the
output projection for softmax makes a large portion of the model’s parameters.
To overcome the bottleneck, the paper adds the nearest neighbor search based on
the decoder hidden states. With one pass over the training data, they store the
decoder states together with the corresponding output tokens—the tokens that
are actually in the training data, regardless of what the softmax predicts. At
inference time, they