3 Ways to Encode Categorical Variables for Deep Learning
Last Updated on August 27, 2020
Machine learning and deep learning models, like those in Keras, require all input and output variables to be numeric.
This means that if your data contains categorical data, you must encode it to numbers before you can fit and evaluate a model.
The two most popular techniques are an integer encoding and a one hot encoding, although a newer technique called learned embedding may provide a useful middle ground between these two methods.
In this tutorial, you will discover how to encode categorical data when developing neural network models in Keras.
After completing this tutorial, you will know:
- The challenge of working with categorical data when using machine learning and deep learning models.
- How to integer encode and one hot encode categorical variables for modeling.
- How to learn an embedding distributed representation as part of a neural network for categorical variables.
Kick-start your project with my new book Deep Learning With Python, including step-by-step tutorials and the Python source code files for all examples.
Let’s get started.