Highlights from Machine Translation and Multilinguality in summer 2023
Here are short summaries of the papers I liked the most during the (academic)
summer. Also, this time, I am posting both on GitHub pages and on Medium.
The preprint from the University of Würzburg presents a recipe for recycling
existing models to create a multilingual vision-language model. They start with
the English-only language model BLIP-2, which allows images to be a part of its
input (the output is always textual). They take the image encoder from this
model and start using it as input to a multilingual model (they experiment with
mT0-XL and mT5-XL) and finetune it to work well with visual inputs. The cool
thing about the method is that because they only train a small part connecting
the visual encoder and LLM (that produces 32 embeddings), most