Generating Synthetic Data with Numpy and Scikit-Learn

Introduction In this tutorial, we’ll discuss the details of generating different synthetic datasets using Numpy and Scikit-learn libraries. We’ll see how different samples can be generated from various distributions with known parameters. We’ll also discuss generating datasets for different purposes, such as regression, classification, and clustering. At the end we’ll see how we can generate a dataset that mimics the distribution of an existing dataset. The Need for Synthetic Data In data science, synthetic data plays a very important role. […]

Read more

Python: Get Number of Elements in a List

Introduction Getting the number of elements in a list in Python is a common operation. For example, you will need to know how many elements the list has whenever you iterate through it. Remember that lists can have a combination of integers, floats, strings, booleans, other lists, etc. as their elements: # List of just integers list_a = [12, 5, 91, 18] # List of integers, floats, strings, booleans list_b = [4, 1.2, “hello world”, True] If we count the […]

Read more

Quick Guide: Steps To Perform Text Data Cleaning in Python

Introduction Twitter has become an inevitable channel for brand management. It has compelled brands to become more responsive to their customers. On the other hand, the damage it would cause can’t be undone. The 140 character tweets has now become a powerful tool for customers / users to directly convey messages to brands. For companies, these tweets carry a lot of information like sentiment, engagement, reviews and features of its products and what not. However, mining these tweets isn’t easy. Why? Because, before you mine this data, you need […]

Read more

Simple NLP in Python With TextBlob: Tokenization

Introduction The amount of textual data on the Internet has significantly increased in the past decades. There’s no doubt that the processing of this amount of information must be automated, and the TextBlob package is one of the fairly simple ways to perform NLP – Natural Language Processing. It provides a simple API for diving into common natural language processing (NLP) tasks such as part-of-speech tagging, noun phrase extraction, tokenization, sentiment analysis, classification, translation, and more. No special technical prerequisites […]

Read more

Add Legend to Figure in Matplotlib

Introduction Matplotlib is one of the most widely used data visualization libraries in Python. Typically, when visualizing more than one variable, you’ll want to add a legend to the plot, explaining what each variable represents. In this article, we’ll take a look at how to add a legend to a Matplotlib plot. Creating a Plot Let’s first create a simple plot with two variables: import matplotlib.pyplot as plt import numpy as np fig, ax = plt.subplots() x = np.arange(0, 10, […]

Read more

Get Started with PyTorch – Learn How to Build Quick & Accurate Neural Networks (with 4 Case Studies!)

Introduction PyTorch v TensorFlow – how many times have you seen this polarizing question pop up on social media? The rise of deep learning in recent times has been fuelled by the popularity of these frameworks. There are staunch supporters of both, but a clear winner has started to emerge in the last year. PyTorch was one of the most popular frameworks in 2018. It quickly became the preferred go-to deep learning framework among researchers in both academia and the […]

Read more

Predicting Movie Genres using NLP – An Awesome Introduction to Multi-Label Classification

Introduction I was intrigued going through this amazing article on building a multi-label image classification model last week. The data scientist in me started exploring possibilities of transforming this idea into a Natural Language Processing (NLP) problem. That article showcases computer vision techniques to predict a movie’s genre. So I had to find a way to convert that problem statement into text-based data. Now, most NLP tutorials look at solving single-label classification challenges (when there’s only one label per observation). […]

Read more

Save Plot as Image with Matplotlib

Introduction Matplotlib is one of the most widely used data visualization libraries in Python. It’s common to share Matplotlib plots and visualizations with others. In this article, we’ll take a look at how to save a plot/graph as an image file using Matplotlib. Creating a Plot Let’s first create a simple plot: import matplotlib.pyplot as plt import numpy as np x = np.arange(0, 10, 0.1) y = np.sin(x) plt.plot(x, y) plt.show() Here, we’ve plotted a sine function, starting at 0 […]

Read more

Python with Pandas: DataFrame Tutorial with Examples

Introduction Pandas is an open-source Python library for data analysis. It is designed for efficient and intuitive handling and processing of structured data. The two main data structures in Pandas are Series and DataFrame. Series are essentially one-dimensional labeled arrays of any type of data, while DataFrames are two-dimensional, with potentially heterogenous data types, labeled arrays of any type of data. Heterogenous means that not all “rows” need to be of equal size. In this article we will go through […]

Read more

Remove Element from an Array in Python

Introduction This tutorial will go through some common ways for removing elements from Python arrays. Here’s a list of all the techniques and methods we’ll cover in this article: Arrays in Python Arrays and lists are not the same thing in Python. Although lists are more commonly used than arrays, the latter still have their use cases. The main difference between the two is that lists can be used to store arbitrary values. They are also heterogeneous which means they […]

Read more
1 16 17 18 19 20 54