Why Data Preparation Is So Important in Machine Learning
Last Updated on June 30, 2020
On a predictive modeling project, machine learning algorithms learn a mapping from input variables to a target variable.
The most common form of predictive modeling project involves so-called structured data or tabular data. This is data as it looks in a spreadsheet or a matrix, with rows of examples and columns of features for each example.
We cannot fit and evaluate machine learning algorithms on raw data; instead, we must transform the data to meet the requirements of individual machine learning algorithms. More than that, we must choose a representation for the data that best exposes the unknown underlying structure of the prediction problem to the learning algorithms in order to get the best performance given our available resources on a predictive modeling project.
Given that we have standard implementations of highly parameterized machine learning algorithms in open source libraries, fitting models has become routine. As such, the most challenging part of each predictive modeling project is how to prepare the one thing that is unique to the project: the data used for modeling.
In this tutorial, you will discover the importance of data preparation for each machine learning project.
After completing
To finish reading, please visit source site