A friendly guide to NLP: Bag-of-Words with Python example
data:image/s3,"s3://crabby-images/bdb11/bdb11b8d531049f811b3289bddfb14c8fbdf0a9b" alt=""
1. A Quick Example
Let’s look at an easy example to understand the concepts previously explained. We could be interested in analyzing the reviews about Game of Thrones:
Review 1: Game of Thrones is an amazing tv series!
Review 2: Game of Thrones is the best tv series!
Review 3: Game of Thrones is so great
In the table, I show all the calculations to obtain the Bag-Of-Words approach:
data:image/s3,"s3://crabby-images/48167/48167cf78c9d24db583263322c5b6f05cbf0c4d5" alt="Bag-of-Words with Python example"
Each row corresponds to a different review, while the rows are the unique words, contained in the three documents.
2. Implementation with Python
Let’s import the libraries and define the variables, that contain the reviews:
import pandas as pd import numpy as np import collections
doc1 = 'Game of Thrones is an amazing tv series!' doc2 = 'Game of Thrones is the best tv series!' doc3 = 'Game of Thrones is so great'
We need to remove punctuations, one of the steps I showed in the previous post about the text pre-processing. We