Ordinal and One-Hot Encodings for Categorical Data
Last Updated on August 17, 2020
Machine learning models require all input and output variables to be numeric.
This means that if your data contains categorical data, you must encode it to numbers before you can fit and evaluate a model.
The two most popular techniques are an Ordinal Encoding and a One-Hot Encoding.
In this tutorial, you will discover how to use encoding schemes for categorical machine learning data.
After completing this tutorial, you will know:
- Encoding is a required pre-processing step when working with categorical data for machine learning algorithms.
- How to use ordinal encoding for categorical variables that have a natural rank ordering.
- How to use one-hot encoding for categorical variables that do not have a natural rank ordering.
Kick-start your project with my new book Data Preparation for Machine Learning, including step-by-step tutorials and the Python source code files for all examples.
Let’s get started.