The Transformer Attention Mechanism
data:image/s3,"s3://crabby-images/7ce96/7ce96dbf0e383e3f7a658bd2f4f4af362f5d29cc" alt=""
Before the introduction of the Transformer model, the use of attention for neural machine translation was implemented by RNN-based encoder-decoder architectures. The Transformer model revolutionized the implementation of attention by dispensing with recurrence and convolutions and, alternatively, relying solely on a self-attention mechanism.
We will first focus on the Transformer attention mechanism in this tutorial and subsequently review the Transformer model in a separate one.
In this tutorial, you will discover the Transformer attention mechanism for neural machine translation.
After completing this tutorial, you will know:
- How the Transformer attention differed from its predecessors
- How the Transformer computes a scaled-dot product attention
- How the Transformer computes multi-head attention
Kick-start your project with my