How do Transformers Work in NLP? A Guide to the Latest State-of-the-Art Models

Overview
- The Transformer model in NLP has truly changed the way we work with text data
- Transformer is behind the recent NLP developments, including Google’s BERT
- Learn how the Transformer idea works, how it’s related to language modeling, sequence-to-sequence modeling, and how it enables Google’s BERT model
Introduction
I love being a data scientist working in Natural Language Processing (NLP) right now. The breakthroughs and developments are occurring at an unprecedented pace. From the super-efficient ULMFiT framework to Google’s BERT, NLP is truly in the midst of a golden era.
And at the heart of this revolution is the concept of the Transformer. This has transformed the way we data scientists work with text data – and you’ll soon see how in this article.
Want an example of how useful Transformer is? Take a look at the paragraph below:
The highlighted words refer to the same person –