Issue #60 – Character-based Neural Machine Translation with Transformers
14 Nov19
Issue #60 – Character-based Neural Machine Translation with Transformers
Author: Dr. Patrik Lambert, Machine Translation Scientist @ Iconic
We saw in issue #12 of this blog how character-based recurrent neural networks (RNNs) could outperform (sub)word-based models if the network is deep enough. However, character sequences are much longer than subword ones, which is not easy to deal with in RNNs. In this post, we discuss how the Transformer architecture changes the situation for character-based models. We take a look at two papers showing that on specific tasks, character-based Transformer models achieve better results than the subword baseline.
Benefits of the Transformer model
Translating characters instead of subwords improves generalisation and simplifies the model through a dramatic reduction of the vocabulary. However, it also implies dealing with much longer sequences, which presents significant modelling and computational challenges for sequence-to-sequence neural models, especially for RNNs. Another drawback in many languages is that characters just represent an orthographic symbol and do not carry meaning.
Ngo et al. (2019) describe the benefits of the Transformer model with respect to RNNs for character-based neural MT. Unlike RNNs, Transformers can jointly learn segmentation and representation. They can namely have
To finish reading, please visit source site