Highlights from Machine Translation and Multilinguality in summer 2023

Here are short summaries of the papers I liked the most during the (academic) summer. Also, this time, I am posting both on GitHub pages and on Medium. The preprint from the University of Würzburg presents a recipe for recycling existing models to create a multilingual vision-language model. They start with the English-only language model BLIP-2, which allows images to be a part of its input (the output is always textual). They take the image encoder from this model and […]

Read more

Highlights from Machine Translation and Multilinguality in June 2023

Here are the preprints that I found the most interesting in June 2023. Exploring the Relationship between Alignment and Cross-lingual Transfer in Multilingual Transformers Folks from LORIA (a French research institute) and Posos (a French company) study the relationship between cross-lingual representation alignment and cross-lingual transfer. Here, alignment means what I would call language neutrality, i.e., that similar sentences should receive similar representation across languages. (Not alignment as the new word for finetuning language models to follow instructions, nor the […]

Read more

Speeding up arXiv browsing

Staying up to date with the newest NLP work is a tough job, and reading about new research takes a significant amount of my time. For several years, one of my work routines has been skimming over the arXiv digest. I open a few preprints, glance over them, and write some notes into Zotero. Once a month, I write a blog post about what I think was the most interesting, which should force me to understand the papers, at least […]

Read more

Highlights from Machine Translation and Multilinguality in May 2023

Here are a few papers I found most interesting in the flood of new pre-prints on arXiv. There was ACL’s camera-ready deadline and the start of the EMNLP anonymity period, so there were many more papers than usual. What is the best recipe for character-level encoder-only modeling? A paper from DeepMind accepted to ACL 2023 systematically (and empirically) studies how to train a BERT-like model that works directly with character-level inputs using existing architectural building blocks. Transformers work well with […]

Read more

Highlights from Machine Translation and Multilinguality in April 2023

Here is my monthly summray of what new papers and preprints are liked the most during the previous month. Several institutions in China did a thorough evaluation of how large language models work for machine translation One might think yet another paper like this, but this one is much better than what Tencent did with ChatGPT and just a few tests sentences. This paper uses the Flores 101 test set, a pretty standard large test for 101 languages. Everything is […]

Read more

Few words on Natural Language Processing and User Autonomy

As natural language processing (NLP) finds its way from university labs and becomes a crucial element of many user-facing technologies (machine translation, search, language-model-based assistants), people start to get concerned about the ethics of this technology. When people talk about NLP ethics, the main topics are: biases that the models get from training data, replication of toxic behavior found on the Internet, underrepresentation of already underprivileged groups, differences between the technology availability between the global north and global south. Now, […]

Read more

Highlights from Machine Translation and Multilinguality in March 2023

Here is what I found the most interesting in MT and multilinguality in March I only feature two papers (both from Microsoft, co-incidence), not because there were too few on arXiv, but because I did not manage to read that much this month. In this paper, folks from Microsoft in India experiment with zero-shot crosslingual transfer for classification. They use a multi-task learning setup. Besides performing the task in the source language, they teach the model using a two-player game. […]

Read more

Highlights from Machine Translation and Multilinguality in February 2023

There were plenty of interesting pre-prints on arXiv in February. Here is a brief summary of three that I think are cool but could get lost in the hundreds of papers that went public. The unreasonable effectiveness of few-shot learning for machine translation Folks from Google experimented with few-shot MT based on language-model. Instead of using one of the cool huge language models we all know, they train their smaller ones. They prepare specific bi- and tri-lingual LMs (8B parameters; […]

Read more

Questions and answers about ChatGPT and large language models

There’s been a lot of media coverage of ChatGPT and language models lately, and I feel like not everything is being said quite right. That’s why I have prepared some questions and answers that hopefully help clarify what they are talking about. Questions: What is a (large) language model? What is ChatGPT? Are GPT-3.5 and ChatGPT the best things out there? Are there any available alternatives, ideally open source? How can ChatGPT speak multiple languages? Does it use machine translation? […]

Read more

Otázky a odpovědi o ChatGPT a velkých jazykových modelech

Poslední dobou se v médiích poměrně často píše o ChatGPT a jazykových modelech a mám pocit, že ne úplně všechno se říká úplně správně. Proto jsem připravil několik otázek a odpovědí, které snad pomůžou vyjasnit, o čem se to vlastně mluví. Otázky: Co je to (velký) jazykový model? Co je to ChatGPT? Je GPT-3.5 a ChatGPT to nejlepší, co existuje? Jsou nějaké dostupné alternativy, ideálně open source? Jaktože umí ChatGPT česky, používá strojový překlad? Kde se berou znalosti, které má ChatGPT? […]

Read more
1 2 3 4 10