Articles About Machine Learning

Feature Engineering and Selection (Book Review)

Last Updated on June 30, 2020 Data preparation is the process of transforming raw data into learning algorithms. In some cases, data preparation is a required step in order to provide the data to an algorithm in its required input format. In other cases, the most appropriate representation of the input data is not known and must be explored in a trial-and-error manner in order to discover what works best for a given model and dataset. Max Kuhn and Kjell […]

Read more

Data Preparation for Machine Learning (7-Day Mini-Course)

Last Updated on June 30, 2020 Data Preparation for Machine Learning Crash Course.Get on top of data preparation with Python in 7 days. Data preparation involves transforming raw data into a form that is more appropriate for modeling. Preparing data may be the most important part of a predictive modeling project and the most time-consuming, although it seems to be the least discussed. Instead, the focus is on machine learning algorithms, whose usage and parameterization has become quite routine. Practical […]

Read more

8 Top Books on Data Cleaning and Feature Engineering

Data preparation is the transformation of raw data into a form that is more appropriate for modeling. It is a challenging topic to discuss as the data differs in form, type, and structure from project to project. Nevertheless, there are common data preparation tasks across projects. It is a huge field of study and goes by many names, such as “data cleaning,” “data wrangling,” “data preprocessing,” “feature engineering,” and more. Some of these are distinct data preparation tasks, and some […]

Read more

How to Choose Data Preparation Methods for Machine Learning

Last Updated on July 15, 2020 Data preparation is an important part of a predictive modeling project. Correct application of data preparation will transform raw data into a representation that allows learning algorithms to get the most out of the data and make skillful predictions. The problem is choosing a transform or sequence of transforms that results in a useful representation is very challenging. So much so that it may be considered more of an art than a science. In […]

Read more

How to Use Feature Extraction on Tabular Data for Machine Learning

Last Updated on August 17, 2020 Machine learning predictive modeling performance is only as good as your data, and your data is only as good as the way you prepare it for modeling. The most common approach to data preparation is to study a dataset and review the expectations of a machine learning algorithm, then carefully choose the most appropriate data preparation techniques to transform the raw data to best meet the expectations of the algorithm. This is slow, expensive, […]

Read more

4 Automatic Outlier Detection Algorithms in Python

Last Updated on August 17, 2020 The presence of outliers in a classification or regression dataset can result in a poor fit and lower predictive modeling performance. Identifying and removing outliers is challenging with simple statistical methods for most machine learning datasets given the large number of input variables. Instead, automatic outlier detection methods can be used in the modeling pipeline and compared, just like other data preparation transforms that may be applied to the dataset. In this tutorial, you […]

Read more

6 Dimensionality Reduction Algorithms With Python

Last Updated on August 17, 2020 Dimensionality reduction is an unsupervised learning technique. Nevertheless, it can be used as a data transform pre-processing step for machine learning algorithms on classification and regression predictive modeling datasets with supervised learning algorithms. There are many dimensionality reduction algorithms to choose from and no single best algorithm for all cases. Instead, it is a good idea to explore a range of dimensionality reduction algorithms and different configurations for each algorithm. In this tutorial, you […]

Read more

Framework for Data Preparation Techniques in Machine Learning

Last Updated on July 17, 2020 There are a vast number of different types of data preparation techniques that could be used on a predictive modeling project. In some cases, the distribution of the data or the requirements of a machine learning model may suggest the data preparation needed, although this is rarely the case given the complexity and high-dimensionality of the data, the ever-increasing parade of new machine learning algorithms and limited, although human, limitations of the practitioner. Instead, […]

Read more

How to Grid Search Data Preparation Techniques

Last Updated on August 17, 2020 Machine learning predictive modeling performance is only as good as your data, and your data is only as good as the way you prepare it for modeling. The most common approach to data preparation is to study a dataset and review the expectations of a machine learning algorithms, then carefully choose the most appropriate data preparation techniques to transform the raw data to best meet the expectations of the algorithm. This is slow, expensive, […]

Read more

How to Create Custom Data Transforms for Scikit-Learn

Last Updated on July 19, 2020 The scikit-learn Python library for machine learning offers a suite of data transforms for changing the scale and distribution of input data, as well as removing input features (columns). There are many simple data cleaning operations, such as removing outliers and removing columns with few observations, that are often performed manually to the data, requiring custom code. The scikit-learn library provides a way to wrap these custom data transforms in a standard way so […]

Read more
1 212 213 214 215 216 226