The Transformer Model
We have already familiarized ourselves with the concept of self-attention as implemented by the Transformer attention mechanism for neural machine translation. We will now be shifting our focus to the details of the Transformer architecture itself to discover how self-attention can be implemented without relying on the use of recurrence and convolutions. In this tutorial, you will discover the network architecture of the Transformer model. After completing this tutorial, you will know: How the Transformer architecture implements an encoder-decoder structure […]
Read more