Machine Translation Weekly 78: Multilingual Hate Speech Detection
This week I will comment on a preprint Cross-lingual hate speech detection
based on multilingual domain-specific word
embeddings by authors from the
University of Chile.
The pre-print evaluates the possibility of cross-lingual transfer of models for
hate speech detection, i.e., training a model in one language and testing it in
a different language. Hate speech detection is a particularly tough task for
model transfer because many of the words have a different meaning or at least
different connotations when used in hate speech than in their more standard
use. An example from the paper says that the Italian word “migranti” is usually
translated into English as migrants, but in the hate speech context, it
typically means illegal immigrants – which are in the context of American hate
speech usually