Articles About Machine Learning

One Hot Encoding: Understanding the “Hot” in Data

Preparing categorical data correctly is a fundamental step in machine learning, particularly when using linear models. One Hot Encoding stands out as a key technique, enabling the transformation of categorical variables into a machine-understandable format. This post tells you why you cannot use a categorical variable directly and demonstrates the use One Hot Encoding in our search for identifying the most predictive categorical features for linear regression. Let’s get started. One Hot Encoding: Understanding the “Hot” in DataPhoto by sutirta […]

Read more

Free Tools Every ML Beginner Should Use

Image by Author We have all experienced it: starting is the toughest part of any journey. So getting started in the ML field wouldn’t be any different. This is why today I want to highlight some of the essential tools that every beginner — or person willing to get started — with ML should be using. Jupyter Notebook Jupyter Notebook is a blessing for any beginner willing to start to code professionally. It is an open-source web interface that allows […]

Read more

The Search for the Sweet Spot in a Linear Regression with Numeric Features

Consistent with the principle of Occam’s razor, starting simple often leads to the most profound insights, especially when piecing together a predictive model. In this post, using the Ames Housing Dataset, we will first pinpoint the key features that shine on their own. Then, step by step, we’ll layer these insights, observing how their combined effect enhances our ability to forecast accurately. As we delve deeper, we will harness the power of the Sequential Feature Selector (SFS) to sift through […]

Read more

The Strategic Use of Sequential Feature Selector for Housing Price Predictions

To understand housing prices better, simplicity and clarity in our models are key. Our aim with this post is to demonstrate how straightforward yet powerful techniques in feature selection and engineering can lead to creating an effective, simple linear regression model. Working with the Ames dataset, we use a Sequential Feature Selector (SFS) to identify the most impactful numeric features and then enhance our model’s accuracy through thoughtful feature engineering. Let’s get started. The Strategic Use of Sequential Feature Selector […]

Read more

Building a Simple RAG Application Using LlamaIndex

Image by Author In this tutorial, we will explore Retrieval-Augmented Generation (RAG) and the LlamaIndex AI framework. We will learn how to use LlamaIndex to build a RAG-based application for Q&A over the private documents and enhance the application by incorporating a memory buffer. This will enable the LLM to generate the response using the context from both the document and previous interactions. What is RAG in LLMs? Retrieval-Augmented Generation (RAG) is an advanced methodology designed to enhance the performance […]

Read more

5 Free Podcasts That Demystify Machine Learning Concepts

Image by Editor | Midjourney Machine learning (ML) has become a buzzword in recent years, with applications ranging from voice assistants to self-driving cars. Yet, for many, the inner workings of these technologies remain a mystery. Podcasts offer a great way to learn about this field without getting overwhelmed. They break down complex ideas into simpler terms and let you learn at your own pace. In this article, I will share 5 of my favorite ML podcasts, which excel at […]

Read more

From Train-Test to Cross-Validation: Advancing Your Model’s Evaluation

Many beginners will initially rely on the train-test method to evaluate their models. This method is straightforward and seems to give a clear indication of how well a model performs on unseen data. However, this approach can often lead to an incomplete understanding of a model’s capabilities. In this blog, we’ll discuss why it’s important to go beyond the basic train-test split and how cross-validation can offer a more thorough evaluation of model performance. Join us as we guide you […]

Read more

5 Tips for Getting Started with Time Series Analysis

Image by Author | Created on Canva As a machine learning engineer or a data scientist, you’ll likely need to work with time series data. Time series analysis focuses on data indexed by time, such as stock prices, temperature, and the like. If you’re already comfortable with machine learning fundamentals but new to time series, this guide will provide you with five actionable tips to get started. These tips will help you understand the aspects of time series data, preprocess […]

Read more

Integrating Scikit-Learn and Statsmodels for Regression

Statistics and Machine Learning both aim to extract insights from data, though their approaches differ significantly. Traditional statistics primarily concerns itself with inference, using the entire dataset to test hypotheses and estimate probabilities about a larger population. In contrast, machine learning emphasizes prediction and decision-making, typically employing a train-test split methodology where models learn from a portion of the data (the training set) and validate their predictions on unseen data (the testing set). In this post, we will demonstrate how […]

Read more

Tips for Tuning Hyperparameters in Machine Learning Models

Image by Author | Created on Canva If you’re familiar with machine learning, you know that the training process allows the model to learn the optimal values for the parameters—or model coefficients—that characterize it. But machine learning models also have a set of hyperparameters whose values you should specify when training the model. So how do you find the optimal values for these hyperparameters? You can use hyperparameter tuning to find the best values for the hyperparameters. By systematically adjusting […]

Read more
1 5 6 7 8 9 226