Highlights from Machine Translation and Multilinguality in October 2024
Here are summaries of a few pre-preprints that I noticed on arXiv during October. LangSAMP: Language-Script Aware Multilingual Pretraining Folks from LMU Munich try a relatively simple trick to improve multilingual encoder models, particularly non-Latin-script and low-resource languages. They use additional information about the language identity and the script, but only during training, so at the inference, we can still use the model without caring about what language we feed in. They add static language and script embeddings before the […]
Read more