Python: How to Handle Missing Data in Pandas DataFrame
Introduction
Pandas is a Python library for data analysis and manipulation. Almost all operations in pandas
revolve around DataFrame
s, an abstract data structure tailor-made for handling a metric ton of data.
In the aforementioned metric ton of data, some of it is bound to be missing for various reasons. Resulting in a missing (null
/None
/Nan
) value in our DataFrame
.
Which is why, in this article, we’ll be discussing how to handle missing data in a Pandas DataFrame
.
Data Inspection
Real-world datasets are rarely perfect. They may contain missing values, wrong data types, unreadable characters, erroneous lines, etc.
The first step to to any proper data analysis is cleaning and organizing the data we’ll later be using. We will discuss a few common problems related to data that might occur in a dataset.
We will be working with small employees dataset for this. The .csv