Python tutorials

Random Forest Algorithm with Python and Scikit-Learn

Random forest is a type of supervised machine learning algorithm based on ensemble learning. Ensemble learning is a type of learning where you join different types of algorithms or same algorithm multiple times to form a more powerful prediction model. The random forest algorithm combines multiple algorithm of the same type i.e. multiple decision trees, resulting in a forest of trees, hence the name “Random Forest”. The random forest algorithm can be used for both regression and classification tasks. How […]

Read more

The Python tempfile Module

Introduction Temporary files, or “tempfiles”, are mainly used to store intermediate information on disk for an application. These files are normally created for different purposes such as temporary backup or if the application is dealing with a large dataset bigger than the system’s memory, etc. Ideally, these files are located in a separate directory, which varies on different operating systems, and the name of these files are unique. The data stored in temporary files is not always required after the […]

Read more

Converting Strings to datetime in Python

Introduction One of the many common problems that we face in software development is handling dates and times. After getting a date-time string from an API, for example, we need to convert it to a human-readable format. Again, if the same API is used in different timezones, the conversion will be different. A good date-time library should convert the time as per the timezone. This is just one of many nuances that need to be handled when dealing with dates […]

Read more

The Naive Bayes Algorithm in Python with Scikit-Learn

When studying Probability & Statistics, one of the first and most important theorems students learn is the Bayes’ Theorem. This theorem is the foundation of deductive reasoning, which focuses on determining the probability of an event occurring based on prior knowledge of conditions that might be related to the event. The Naive Bayes Classifier brings the power of this theorem to Machine Learning, building a very simple yet powerful classifier. In this article, we will see an overview on how […]

Read more

Hierarchical Clustering with Python and Scikit-Learn

Hierarchical clustering is a type of unsupervised machine learning algorithm used to cluster unlabeled data points. Like K-means clustering, hierarchical clustering also groups together the data points with similar characteristics. In some cases the result of hierarchical and K-Means clustering can be similar. Before implementing hierarchical clustering using Scikit-Learn, let’s first understand the theory behind hierarchical clustering. Theory of Hierarchical Clustering There are two types of hierarchical clustering: Agglomerative and Divisive. In the former, data points are clustered using a […]

Read more

Cross Validation and Grid Search for Model Selection in Python

Introduction A typical machine learning process involves training different models on the dataset and selecting the one with best performance. However, evaluating the performance of algorithm is not always a straight forward task. There are several factors that can help you determine which algorithm performance best. One such factor is the performance on cross validation set and another other factor is the choice of parameters for an algorithm. In this article we will explore these two factors in detail. We […]

Read more

The Python Requests Module

Introduction Dealing with HTTP requests is not an easy task in any programming language. If we talk about Python, it comes with two built-in modules, urllib and urllib2, to handle HTTP related operation. Both modules come with a different set of functionalities and many times they need to be used together. The main drawback of using urllib is that it is confusing (few methods are available in both urllib, urllib2), the documentation is not clear and we need to write […]

Read more

Association Rule Mining via Apriori Algorithm in Python

Association rule mining is a technique to identify underlying relations between different items. Take an example of a Super Market where customers can buy variety of items. Usually, there is a pattern in what the customers buy. For instance, mothers with babies buy baby products such as milk and diapers. Damsels may buy makeup items whereas bachelors may buy beers and chips etc. In short, transactions involve a pattern. More profit can be generated if the relationship between the items […]

Read more

Using Regex for Text Manipulation in Python

Introduction Text preprocessing is one of the most important tasks in Natural Language Processing (NLP). For instance, you may want to remove all punctuation marks from text documents before they can be used for text classification. Similarly, you may want to extract numbers from a text string. Writing manual scripts for such preprocessing tasks requires a lot of effort and is prone to errors. Keeping in view the importance of these preprocessing tasks, the Regular Expressions (aka Regex) have been […]

Read more

Text Classification with Python and Scikit-Learn

Introduction Text classification is one of the most important tasks in Natural Language Processing. It is the process of classifying text strings or documents into different categories, depending upon the contents of the strings. Text classification has a variety of applications, such as detecting user sentiment from a tweet, classifying an email as spam or ham, classifying blog posts into different categories, automatic tagging of customer queries, and so on. In this article, we will see a real-world example of […]

Read more
1 174 175 176 177 178 180