Highlights from Machine Translation and Multilinguality in December 2024 and January 2025

MuLan: Adapting Multilingual Diffusion Models for Hundreds of Languages with Negligible Cost

Researchers from Tsinghua, Shanghai, Beijing, Hong Kong, and Johns Hopkins have developed a method for adapting diffusion models to hundreds of languages at a minimal cost. They achieve this by swapping the text encoder with a multilingual one and training it to produce representations consistent with the CLIP encoder, leveraging parallel language data and English image-text data. The results look impressive and multilingual, and the generation quality, as measured by CLIP representation similarity, appears promising (although I am not really sure how convincing automatic evaluation can be in such cases).

On The Origin of Cultural Biases in Language Models: From Pre-training Data to Linguistic Phenomena

Folks from Georgia Tech observe that LLMs in Arabic

 

 

To finish reading, please visit source site

Leave a Reply