Issue #78 – Balancing Training data for Multilingual Neural MT

16 Apr20 Issue #78 – Balancing Training data for Multilingual Neural MT Author: Raj Patel, Machine Translation Scientist @ Iconic Multilingual Neural MT (MNMT) can translate to/from multiple languages, but in model training we are faced with imbalanced training sets. This means that some languages have much more training data compared to others. In general, we up-sample the low resource languages to balance the representation. However, the degree of up-sampling has a large effect on the overall performance of the model. […]

Read more