Issue #32 – The Transformer Model: State-of-the-art Neural MT
04 Apr19
Issue #32 – The Transformer Model: State-of-the-art Neural MT
Author: Dr. Rohit Gupta, Sr. Machine Translation Scientist @ Iconic
In this post, we will discuss the Transformer model (Vaswani et al. 2017), which is a state-of-the-art model for Neural MT. The Transformer model was published by Google Brain and Google Research teams in June 2017 and has been a very popular architecture since then. It does not use either Recurrent Neural Networks (RNN) or Convolutional Neural Networks (CNN). Instead, it uses attention mechanism and feed forward layers at various levels of the network to train the whole end-to-end pipeline. What does that mean?
Training Engines
Similar to previous architectures for Neural MT, the Transformer also follows an autoregressive model based on Encoder-Decoder architecture. The Encoder consumes the input sentence and produces an encoded representation. The decoder generates the translation one token at a time using the encoded representation and the previously generated translation.
What is Attention?
It gets difficult for a decoder to generate the translation by looking at the full encoded representation only once. When humans translate, they refer to the source sentence many times during the process (maybe even more than just the source
To finish reading, please visit source site