One-Hot Encoding in Python with Pandas and Scikit-Learn
Introduction
In computer science, data can be represented in a lot of different ways, and naturally, every single one of them has its advantages as well as disadvantages in certain fields.
Since computers are unable to process categorical data as these categories have no meaning for them, this information has to be prepared if we want a computer to be able to process it.
This action is called preprocessing. A big part of preprocessing is encoding – representing every single piece of data in a way that a computer can understand (the name literally means “convert to computer code”).
In many branches of computer science, especially machine learning and digital circuit design, One-Hot Encoding is widely used.
In this article, we will explain what one-hot encoding is and implement it in Python using a few popular choices, Pandas and Scikit-Learn. We’ll also compare it’s effectiveness to other types of representation in computers, its strong points and weaknesses, as well as its applications.
What is One-Hot Encoding?
One-hot Encoding is a type of vector representation in which all of the elements in a vector are 0, except for one, which