Issue #78 – Balancing Training data for Multilingual Neural MT
16 Apr20
Issue #78 – Balancing Training data for Multilingual Neural MT
Author: Raj Patel, Machine Translation Scientist @ Iconic
Multilingual Neural MT (MNMT) can translate to/from multiple languages, but in model training we are faced with imbalanced training sets. This means that some languages have much more training data compared to others. In general, we up-sample the low resource languages to balance the representation. However, the degree of up-sampling has a large effect on the overall performance of the model. In this post, we will discuss a new method proposed by Wang et. al., 2020 that automatically learns how to weight training-data through a data scorer and is optimised to maximise performance on all test languages.
Differentiable Data Selection (DDS)
Differential Data Selection (DDS) is a general machine learning method for optimising the weighting of different training examples to improve a predetermined objective. In the paper, this objective is the average loss from different languages. They directly optimise the weights of training data from each language to maximise the objective on a multilingual development set. Specifically, DDS uses a technique called bilevel optimisation to learn a data scorer P(x,y;ψ), parameterised by ψ, where
To finish reading, please visit source site