NLP – Sentiment Analysis
Now, we can see that our target has changed to 0 and 1,i.e. 0 for Negative and 1 for Positive, and the data is more or less in a balanced state.
Data Pre-processing
Now, we will perform some pre-processing on the data before converting it into vectors and passing it to the machine learning model.
We will create a function for pre-processing of data.
1. First, we will iterate through each record, and using a regular expression, we will get rid of any characters apart from alphabets.
2. Then, we will convert the string to lowercase as, the word “Good” is different from the word “good”.
Because, without converting to lowercase, it will cause an issue when we will create vectors of these words, as two different vectors will be created for the same word which we don’t want to.
3. Then we will check for stopwords in the data and get rid of them. Stopwords are commonly used words