The Transformer Model
We have already familiarized ourselves with the concept of self-attention as implemented by the Transformer attention mechanism for neural machine translation. We will now be shifting our focus to the details of the Transformer architecture itself to discover how self-attention can be implemented without relying on the use of recurrence and convolutions.
In this tutorial, you will discover the network architecture of the Transformer model.
After completing this tutorial, you will know:
- How the Transformer architecture implements an encoder-decoder structure without recurrence and convolutions
- How the Transformer encoder and decoder work
- How the Transformer self-attention compares to the use of recurrent and convolutional layers
Kick-start your project with my book Building Transformer