Highlights from Machine Translation and Multilinguality in March 2025

EuroBERT: Scaling Multilingual Encoders for European Languages

A large group of authors, mostly from CentraleSupélec in Paris and Instituto Técnico in Lisbon, released EuroBERT, a multilingual BERT model for European and major global languages. There is also a 2.1 B version, unusually large for encoder models.

High-Dimensional Interlingual Representations of Large Language Models

A print from the Hong Kong University of Science and Technology evaluates the sentence-level similarity of LLM hidden states across languages. It shows that the idea that langauge models trained multilingually represent everything in a shared semantic space (perhaps structured by English) is not the whole truth. In the paper, they devise several metrics for comparing the spaces and show that the representations are split into fragmented subspaces. This is, for me,

 

 

To finish reading, please visit source site

Leave a Reply