Articles About Natural Language Processing

How to Use Tfidftransformer & Tfidfvectorizer?

Scikit-learn’s Tfidftransformer and Tfidfvectorizer aim to do the same thing, which is to convert a collection of raw documents to a matrix of TF-IDF features. The differences between the two modules can be quite confusing and it’s hard to know when to use which. This article shows you how to correctly use each module, the differences between the two and some guidelines on what to use when. Tfidftransformer Usage 1. Dataset and Imports Below we have 5 toy documents, all […]

Read more

Easily Access Pre-trained Word Embeddings with Gensim

What are pre-trained embeddings and why? Pre-trained word embeddings are vector representation of words trained on a large dataset. With pre-trained embeddings, you will essentially be using the weights and vocabulary from the end result of the training process done by….someone else! (It could also be you) One benefit of using pre-trained embeddings is that you can hit the ground running without the need for finding a large text corpora which you will have to preprocess and train with the […]

Read more

Build Your First Text Classifier in Python with Logistic Regression

Text classification is the automatic process of predicting one or more categories given a piece of text. For example, predicting if an email is legit or spammy. Thanks to Gmail’s spam classifier, I don’t see or hear from spammy emails! Spam classification Other than spam detection, text classifiers can be used to determine sentiment in social media texts, predict categories of news articles, parse and segment unstructured documents, flag the highly talked about fake news articles and more. Text classifiers […]

Read more

10+ Examples for Using CountVectorizer

Scikit-learn’s CountVectorizer is used to transform a corpora of text to a vector of term / token counts. It also provides the capability to preprocess your text data prior to generating the vector representation making it a highly flexible feature representation module for text. In this article, we are going to go in-depth into the different ways you can use CountVectorizer such that you are not just computing counts of words, but also preprocessing your text data appropriately as well […]

Read more

5 Ways to Improve Productivity in Customer Support with AI

Companies receive support inquiries from various channels. This may include emails, support tickets, tweets, chat conversations with customer support representatives (CSRs), chatbot conversations, and more. Sources of customer service requests This is a lot of data that you are dealing with and it’s mostly unstructured and scattered in nature, making it that much harder to manage. All this text data can actually be leveraged to improve speed in responding to customer service inquiries and reduce the volume of incoming tickets.  According to a research […]

Read more

Text Classification: Best Practices for Real World Applications

Most text classification examples that you see on the Web or in books focus on demonstrating techniques. This will help you build a pseudo usable prototype. If you want to take your classifier to the next level and use it within a product or service workflow, then there are things you need to do from day one to make this a reality. I’ve seen classifiers failing miserably and being replaced with off the shelf solutions  because they don’t work in […]

Read more

HashingVectorizer vs. CountVectorizer

Previously, we learned how to use CountVectorizer for text processing. In place of CountVectorizer, you also have the option of using HashingVectorizer. In this tutorial, we will learn how HashingVectorizer differs from CountVectorizer and when to use which. CountVectorizer vs. HashingVectorizer HashingVectorizer and CountVectorizer are meant to do the same thing. Which is to convert a collection of text documents to a matrix of token occurrences. The difference is that HashingVectorizer does not store the resulting vocabulary (i.e. the unique […]

Read more

Word2Vec: A Comparison Between CBOW, SkipGram & SkipGramSI

Word2Vec is a widely used word representation technique that uses neural networks under the hood. The resulting word representation or embeddings can be used to infer semantic similarity between words and phrases, expand queries, surface related concepts and more. The sky is the limit when it comes to how you can use these embeddings for different NLP tasks. In this article, we will look at how the different neural network architectures for training a Word2Vec model behave in practice. The […]

Read more

Before AI, Invest in A Big Data Strategy

Big data describes the volumes of data that your company generates, every single day. Both structured and unstructured. Analysts at Gartner estimate that more than 80 percent of enterprise data is unstructured. Meaning, they can be text files from IT logs, emails from customer support, direct Twitter messages from customers, and employee complaints to your HR department. This type of diverse and scattered data sources is true of almost every enterprise. A big data strategy, on the other hand, is a glorified term for how […]

Read more

5 Examples of Text Classification in Practice

AI is transforming nearly every industry, and text analysis is a key area of interest. That’s because there’s been an explosion in unstructured text data—nearly 80% of data at most organizations—which is quickly becoming impractical to analyze by humans alone. We’ve already talked about some best practices for building a text classifier, but how can a tool like this help your business? Let’s take a closer look at document classification and some real-world examples. What Is Document Classification? Organizations need […]

Read more
1 39 40 41 42 43 71