Machine Translation Weekly 89: BPE and Memorization
Similar to last week, I will discuss a paper about input segmentation. The
paper is not directly about machine translation or multilinguality but brings
interesting insights for Transformer models in general. The title of the paper
is How BPE affects memorization in
Transformers, it has authors from Facebook
AI and the preprint appeared on Thursday on arXiv.
The paper presents a series of experiments with Transformer models for natural
language inferences and different sizes of BPE-based vocabulary by which they
want to measure to what extent the models memorize the training data (while
ignoring generalization). They came up with three measures for memorization:
-
Being able to memorize data with random labels;
-
Comparing the confidence of the model on training and validation data;