Highlights from Machine Translation and Multilinguality in May 2023

Here are a few papers I found most interesting in the flood of new pre-prints
on arXiv. There was ACL’s camera-ready deadline and the start of the EMNLP
anonymity period, so there were many more papers than usual.

What is the best recipe for character-level encoder-only modeling?

A paper from DeepMind accepted to ACL 2023 systematically (and empirically)
studies how to train a BERT-like model that works directly with character-level
inputs using existing architectural building blocks. Transformers work well
with word-like units, so the main trick with character-level models is that the
long character-level input needs to be downsampled first to get latent
word-like units. In the end, the hidden states need to be upsampled again to
produce characters. According to this paper, the best option is combining

 

 

To finish reading, please visit source site

Leave a Reply