Quick and Dirty Data Analysis with Pandas

Last Updated on January 28, 2020 Before you can select and prepare your data for modeling, you need to understand what you’ve got to start with. If you’re a using the Python stack for machine learning, a library that you can use to better understand your data is Pandas. In this post you will discover some quick and dirty recipes for Pandas to improve the understanding of your data in terms of it’s structure, distribution and relationships. Kick-start your project […]

Read more

Practical Advice for Getting Started in Machine Learning

Last Updated on August 16, 2020 David Mimno is an assistant professor in the Information Sciences department at Cornell University. He has a background and interest in Natural Language Processing (NLP), specifically topic modeling. Notably, he is the chief maintainer of MALLET, the Java-based NLP library. I recently came across a blog post by David titled “Advice for students of machine learning“. This is a great post and includes similar advice that I give to programmers and coaching students. It’s […]

Read more

Books for Machine Learning with R

Last Updated on August 16, 2020 R is a powerful platform for data analysis and machine learning. It is my main workhorse for things like competitions and consulting work. The reason is the large amounts of powerful algorithms available, all on the one platform. In this post I want to point out some resources you can use to get started in R for machine learning. Kick-start your project with my new book Machine Learning Mastery With R, including step-by-step tutorials […]

Read more

Machine Learning Communities

Last Updated on June 7, 2016 Online communities are invaluable in machine learning, regardless of your skill level. The reason is that, like programming, you never stop learning. You simply cannot know everything, there are always new algorithms, new data and new combinations to discover and practice. Communities help. You can get your questions answered, learn by answering other peoples questions and discover new areas from reading through the exchanges. Machine learning communities have had a big impact on my […]

Read more

Machine Learning is Kaggle Competitions

Last Updated on September 5, 2016 Julia Evans wrote a post recently titled “Machine learning isn’t Kaggle competitions“. It was an interesting post because it pointed out an important truth. If you want to solve business problems using machine learning, doing well at Kaggle competitions is not a good indicator of that skills. The rationale is that the work required to do well in a Kaggle competition is only a piece of what is required to deliver a business benefit. […]

Read more

Data Science Screencasts: A Data Origami Review

Last Updated on June 7, 2016 Data Origami is a new website by Cameron Davidson-Pilon that provides data science screencasts. It is a cool idea and a cool site. Cameron was kind enough to give me access to the site so that I could review it. I watched all of the videos I could and wrote up all my notes, and in this post you will get a sneak peek into Cameron’s new site Data Origami. Data Origami Logo Data […]

Read more

How to Load Data in Python with Scikit-Learn

Last Updated on December 13, 2019 Before you can build machine learning models, you need to load your data into memory. In this post you will discover how to load data for machine learning in Python using scikit-learn. Kick-start your project with my new book Machine Learning Mastery With Python, including step-by-step tutorials and the Python source code files for all examples. Let’s get started. Update March/2018: Added alternate link to download the dataset as the original appears to have […]

Read more

Rescaling Data for Machine Learning in Python with Scikit-Learn

Last Updated on June 30, 2020 Your data must be prepared before you can build models. The data preparation process can involve three steps: data selection, data preprocessing and data transformation. In this post you will discover two simple data transformation methods you can apply to your data in Python using scikit-learn. Kick-start your project with my new book Data Preparation for Machine Learning, including step-by-step tutorials and the Python source code files for all examples. Let’s get started. Update: […]

Read more

Feature Selection in Python with Scikit-Learn

Last Updated on June 4, 2020 Not all data attributes are created equal. More is not always better when it comes to attributes or columns in your dataset. In this post you will discover how to select attributes in your data before creating a machine learning model using the scikit-learn library. Kick-start your project with my new book Machine Learning Mastery With Python, including step-by-step tutorials and the Python source code files for all examples. Let’s get started. Update: For […]

Read more

How to Tune Algorithm Parameters with Scikit-Learn

Last Updated on August 21, 2019 Machine learning models are parameterized so that their behavior can be tuned for a given problem. Models can have many parameters and finding the best combination of parameters can be treated as a search problem. In this post, you will discover how to tune the parameters of machine learning algorithms in Python using the scikit-learn library. Kick-start your project with my new book Machine Learning Mastery With Python, including step-by-step tutorials and the Python […]

Read more
1 770 771 772 773 774 911