Highlights from Machine Translation and Multilinguality in May 2023
Here are a few papers I found most interesting in the flood of new pre-prints
on arXiv. There was ACL’s camera-ready deadline and the start of the EMNLP
anonymity period, so there were many more papers than usual.
What is the best recipe for character-level encoder-only modeling?
A paper from DeepMind accepted to ACL 2023 systematically (and empirically)
studies how to train a BERT-like model that works directly with character-level
inputs using existing architectural building blocks. Transformers work well
with word-like units, so the main trick with character-level models is that the
long character-level input needs to be downsampled first to get latent
word-like units. In the end, the hidden states need to be upsampled again to
produce characters. According to this paper, the best option is combining