Text Classification with BERT Tokenizer and TF 2.0 in Python
This is the 23rd article in my series of articles on Python for NLP. In the previous article of this series, I explained how to perform neural machine translation using seq2seq architecture with Python’s Keras library for deep learning.
In this article we will study BERT, which stands for Bidirectional Encoder Representations from Transformers and its application to text classification. BERT is a text representation technique like Word Embeddings. If you have no idea of how word embeddings work, take a look at my article on word embeddings.
Like word embeddings, BERT is also a text representation technique which is a fusion of variety of state-of-the-art deep learning algorithms, such as bidirectional encoder LSTM and Transformers. BERT was developed by researchers at Google in 2018 and has been proven to be state-of-the-art for a variety of natural language processing tasks such text classification, text summarization, text generation, etc. Just recently, Google announced that BERT is being used as a core part of their search algorithm to better understand queries.
In this article we will not go into the mathematical details of how BERT