Highlights from Machine Translation and Multilinguality in February 2023
There were plenty of interesting pre-prints on arXiv in February. Here is a
brief summary of three that I think are cool but could get lost in the hundreds
of papers that went public.
The unreasonable effectiveness of few-shot learning for machine translation
Folks from Google experimented with few-shot MT based on language-model.
Instead of using one of the cool huge language models we all know, they train
their smaller ones. They prepare specific bi- and tri-lingual LMs (8B
parameters; BERT has 110M, GPT-2 has 1.5B, GPT-3 175B). At inference time, they
retrieve 5 random examples from the train set and use them as a prompt to the
model. It works better than Google Translate and is comparable to the best WMT
submissions. However, it is hard to