A friendly guide to NLP: Bag-of-Words with Python example
1. A Quick Example
Let’s look at an easy example to understand the concepts previously explained. We could be interested in analyzing the reviews about Game of Thrones:
Review 1: Game of Thrones is an amazing tv series!
Review 2: Game of Thrones is the best tv series!
Review 3: Game of Thrones is so great
In the table, I show all the calculations to obtain the Bag-Of-Words approach:
Each row corresponds to a different review, while the rows are the unique words, contained in the three documents.
2. Implementation with Python
Let’s import the libraries and define the variables, that contain the reviews:
import pandas as pd import numpy as np import collections
doc1 = 'Game of Thrones is an amazing tv series!' doc2 = 'Game of Thrones is the best tv series!' doc3 = 'Game of Thrones is so great'
We need to remove punctuations, one of the steps I showed in the previous post about the text pre-processing. We