Highlights from Machine Translation and Multilinguality in December 2022 and January 2023

Here is what I found interesting on arXiv in December 2022 and January 2023. At
the beginning of January, there a relatively few new pre-prints in general.
But now it is catching momentum again, with more papers appearing every
day.

BLOOM+1: Adding Language Support to BLOOM for Zero-Shot Prompting

In this paper, folks from the Big Science Workshop elaborate on how to add
language support to the already trained BLOOM model. They tried two approaches:
MAD-X (clever stuff with
adapters, which adds parameters) and IA^3
(some clever finetuning, which does not add parameters). They did nothing with
tokenization (a slight disappointment for me) and just said BLOOM uses
byte-based BPE, so there are never out-of-vocabulary tokens. Technically, this
is true, but new alphabets split down to

 

 

To finish reading, please visit source site

Leave a Reply