Highlights from Machine Translation and Multilinguality in December 2024 and January 2025
![](https://www.deeplearningdaily.com/wp-content/uploads/2020/09/neural-machine-translation-with-python_5f70dd5f30f47-600x400.png)
MuLan: Adapting Multilingual Diffusion Models for Hundreds of Languages with Negligible Cost
Researchers from Tsinghua, Shanghai, Beijing, Hong Kong, and Johns Hopkins have developed a method for adapting diffusion models to hundreds of languages at a minimal cost. They achieve this by swapping the text encoder with a multilingual one and training it to produce representations consistent with the CLIP encoder, leveraging parallel language data and English image-text data. The results look impressive and multilingual, and the generation quality, as measured by CLIP representation similarity, appears promising (although I am not really sure how convincing automatic evaluation can be in such cases).
On The Origin of Cultural Biases in Language Models: From Pre-training Data to Linguistic Phenomena
Folks from Georgia Tech observe that LLMs in Arabic