Quick Guide: Steps To Perform Text Data Cleaning in Python

Introduction Twitter has become an inevitable channel for brand management. It has compelled brands to become more responsive to their customers. On the other hand, the damage it would cause can’t be undone. The 140 character tweets has now become a powerful tool for customers / users to directly convey messages to brands. For companies, these tweets carry a lot of information like sentiment, engagement, reviews and features of its products and what not. However, mining these tweets isn’t easy. Why? Because, before you mine this data, you need […]

Read more

Introduction to Structuring Customer complaints explained with examples

Introduction In past, if you were not particularly happy with a service or a product, you would go to the service provider or the shop and lodge a complaint. With services-businesses going online and due to enormous scale, lodging complaints in-person may not be always possible. Electronic ways such as emails, social media and particularly websites like www.consumercomplaints.in focusing on such issues, are widely used platforms to vent out the anger as well as publicizing the issue in expectancy of […]

Read more

Novel object captioning surpasses human performance on benchmarks

Consider for a moment what it takes to visually identify and describe something to another person. Now imagine that the other person can’t see the object or image, so every detail matters. How do you decide what information is important and what’s not? You’ll need to know exactly what everything is, where it is, what it’s doing in relation to other objects, and note other attributes like color or position of objects in the foreground or background. This exercise shows […]

Read more

Simple NLP in Python With TextBlob: Tokenization

Introduction The amount of textual data on the Internet has significantly increased in the past decades. There’s no doubt that the processing of this amount of information must be automated, and the TextBlob package is one of the fairly simple ways to perform NLP – Natural Language Processing. It provides a simple API for diving into common natural language processing (NLP) tasks such as part-of-speech tagging, noun phrase extraction, tokenization, sentiment analysis, classification, translation, and more. No special technical prerequisites […]

Read more

Add Legend to Figure in Matplotlib

Introduction Matplotlib is one of the most widely used data visualization libraries in Python. Typically, when visualizing more than one variable, you’ll want to add a legend to the plot, explaining what each variable represents. In this article, we’ll take a look at how to add a legend to a Matplotlib plot. Creating a Plot Let’s first create a simple plot with two variables: import matplotlib.pyplot as plt import numpy as np fig, ax = plt.subplots() x = np.arange(0, 10, […]

Read more

Automated Machine Learning (AutoML) Libraries for Python

AutoML provides tools to automatically discover good machine learning model pipelines for a dataset with very little user intervention. It is ideal for domain experts new to machine learning or machine learning practitioners looking to get good results quickly for a predictive modeling task. Open-source libraries are available for using AutoML methods with popular machine learning libraries in Python, such as the scikit-learn machine learning library. In this tutorial, you will discover how to use top open-source AutoML libraries for […]

Read more

Multi-Core Machine Learning in Python With Scikit-Learn

Many computationally expensive tasks for machine learning can be made parallel by splitting the work across multiple CPU cores, referred to as multi-core processing. Common machine learning tasks that can be made parallel include training models like ensembles of decision trees, evaluating models using resampling procedures like k-fold cross-validation, and tuning model hyperparameters, such as grid and random search. Using multiple cores for common machine learning tasks can dramatically decrease the execution time as a factor of the number of […]

Read more

How to Train to the Test Set in Machine Learning

Training to the test set is a type of overfitting where a model is prepared that intentionally achieves good performance on a given test set at the expense of increased generalization error. It is a type of overfitting that is common in machine learning competitions where a complete training dataset is provided and where only the input portion of a test set is provided. One approach to training to the test set involves constructing a training set that most resembles […]

Read more

How to Hill Climb the Test Set for Machine Learning

Last Updated on September 27, 2020 Hill climbing the test set is an approach to achieving good or perfect predictions on a machine learning competition without touching the training set or even developing a predictive model. As an approach to machine learning competitions, it is rightfully frowned upon, and most competition platforms impose limitations to prevent it, which is important. Nevertheless, hill climbing the test set is something that a machine learning practitioner accidentally does as part of participating in […]

Read more

Linear Discriminant Analysis With Python

Linear Discriminant Analysis is a linear classification machine learning algorithm. The algorithm involves developing a probabilistic model per class based on the specific distribution of observations for each input variable. A new example is then classified by calculating the conditional probability of it belonging to each class and selecting the class with the highest probability. As such, it is a relatively simple probabilistic classification model that makes strong assumptions about the distribution of each input variable, although it can make […]

Read more
1 752 753 754 755 756 911