Ultimate Guide to Heatmaps in Seaborn with Python

Introduction A heatmap is a data visualization technique that uses color to show how a value of interest changes depending on the values of two other variables. For example, you could use a heatmap to understand how air pollution varies according to the time of day across a set of cities. Another, perhaps more rare case of using heatmaps is to observe human behavior – you can create visualizations of how people use social media, how their answers on surveys […]

Read more

Histogram-Based Gradient Boosting Ensembles in Python

Gradient boosting is an ensemble of decision trees algorithms. It may be one of the most popular techniques for structured (tabular) classification and regression predictive modeling problems given that it performs so well across a wide range of datasets in practice. A major problem of gradient boosting is that it is slow to train the model. This is particularly a problem when using the model on large datasets with tens of thousands of examples (rows). Training the trees that are […]

Read more

Efficient One-Pass End-to-End Entity Linking for Questions

November 16, 2020 By: Belinda Z. Li, Sewon Min, Srinivasan Iyer, Yashar Mehdad, Wen-tau Yih Abstract We present ELQ, a fast end-to-end entity linking model for questions, which uses a biencoder to jointly perform mention detection and linking in one pass. Evaluated on WebQSP and GraphQuestions with extended annotations that cover multiple entities per question, ELQ outperforms the previous state of the art by a large margin of +12.7% and +19.6% F1, respectively. With a very fast inference time (1.57 […]

Read more

Feature Selection with Stochastic Optimization Algorithms

Typically, a simpler and better-performing machine learning model can be developed by removing input features (columns) from the training dataset. This is called feature selection and there are many different types of algorithms that can be used. It is possible to frame the problem of feature selection as an optimization problem. In the case that there are few input features, all possible combinations of input features can be evaluated and the best subset found definitively. In the case of a […]

Read more

Reading and Writing HTML Tables with Pandas

Introduction Hypertext Markup Language (HTML) is the standard markup language for building web pages. We can render tabular data using HTML’s element. The Pandas data analysis library provides functions like read_html() and to_html() so we can import and export data to DataFrames. In this article, we will learn how to read tabular data from an HTML file and load it into a Pandas DataFrame. We’ll also learn how to write data from a Pandas DataFrame and to an HTML file. […]

Read more

Ensemble Learning Algorithm Complexity and Occam’s Razor

Occam’s razor suggests that in machine learning, we should prefer simpler models with fewer coefficients over complex models like ensembles. Taken at face value, the razor is a heuristic that suggests more complex hypotheses make more assumptions that, in turn, will make them too narrow and not generalize well. In machine learning, it suggests complex models like ensembles will overfit the training dataset and perform poorly on new data. In practice, ensembles are almost universally the type of model chosen […]

Read more

How to Choose an Optimization Algorithm

Optimization is the problem of finding a set of inputs to an objective function that results in a maximum or minimum function evaluation. It is the challenging problem that underlies many machine learning algorithms, from fitting logistic regression models to training artificial neural networks. There are perhaps hundreds of popular optimization algorithms, and perhaps tens of algorithms to choose from in popular scientific code libraries. This can make it challenging to know which algorithms to consider for a given optimization […]

Read more

Matplotlib Line Plot – Tutorial and Examples

Introduction Matplotlib is one of the most widely used data visualization libraries in Python. From simple to complex visualizations, it’s the go-to library for most. In this tutorial, we’ll take a look at how to plot a line plot in Matplotlib – one of the most basic types of plots. Line Plots display numerical values one one axis, and categorical values on the other. They can typically be used in much the same way Bar Plots can be used, though, […]

Read more

Matplotlib Violin Plot – Tutorial and Examples

Introduction There are many data visualization libraries in Python, yet Matplotlib is the most popular library out of all of them. Matplotlib’s popularity is due to its reliability and utility – it’s able to create both simple and complex plots with little code. You can also customize the plots in a variety of ways. In this tutorial, we’ll cover how to plot Violin Plots in Matplotlib. Violin plots are used to visualize data distributions, displaying the range, median, and distribution […]

Read more
1 697 698 699 700 701 919