Part 4: Step by Step Guide to Master NLP – Text Cleaning Techniques
Introduction
This article is part of an ongoing blog series on Natural Language Processing (NLP). In the previous part of this blog series, we complete the initial steps involved in text cleaning and preprocessing that are related to NLP. Now, in continuation of that part, in this article, we will cover the next techniques involved in the NLP pipeline of Text preprocessing.
In this article, we will first discuss some more text cleaning techniques which might be useful in some NLP tasks and then we start our journey towards the normalization techniques, Stemming, and Lemmatization which are very crucial techniques that you must know while you are working with on an NLP based project.
This is part-4 of the blog series on the Step by Step Guide to Natural Language Processing.
Table of Contents
1. More Text Cleaning Techniques
- Converting Text to Lowercase
- Removing HTML tags
- Removing Unaccented Characters
- Expanding Contractions
- Removing Special Characters
- Correction of