HashingVectorizer vs. CountVectorizer

Previously, we learned how to use CountVectorizer for text processing. In place of CountVectorizer, you also have the option of using HashingVectorizer. In this tutorial, we will learn how HashingVectorizer differs from CountVectorizer and when to use which. CountVectorizer vs. HashingVectorizer HashingVectorizer and CountVectorizer are meant to do the same thing. Which is to convert a collection of text documents to a matrix of token occurrences. The difference is that HashingVectorizer does not store the resulting vocabulary (i.e. the unique […]

Read more

Word2Vec: A Comparison Between CBOW, SkipGram & SkipGramSI

Word2Vec is a widely used word representation technique that uses neural networks under the hood. The resulting word representation or embeddings can be used to infer semantic similarity between words and phrases, expand queries, surface related concepts and more. The sky is the limit when it comes to how you can use these embeddings for different NLP tasks. In this article, we will look at how the different neural network architectures for training a Word2Vec model behave in practice. The […]

Read more

Before AI, Invest in A Big Data Strategy

Big data describes the volumes of data that your company generates, every single day. Both structured and unstructured. Analysts at Gartner estimate that more than 80 percent of enterprise data is unstructured. Meaning, they can be text files from IT logs, emails from customer support, direct Twitter messages from customers, and employee complaints to your HR department. This type of diverse and scattered data sources is true of almost every enterprise. A big data strategy, on the other hand, is a glorified term for how […]

Read more

5 Examples of Text Classification in Practice

AI is transforming nearly every industry, and text analysis is a key area of interest. That’s because there’s been an explosion in unstructured text data—nearly 80% of data at most organizations—which is quickly becoming impractical to analyze by humans alone. We’ve already talked about some best practices for building a text classifier, but how can a tool like this help your business? Let’s take a closer look at document classification and some real-world examples. What Is Document Classification? Organizations need […]

Read more

How to Rename Pandas DataFrame Column in Python

Introduction Pandas is a Python library for data analysis and manipulation. Almost all operations in pandas revolve around DataFrames. A Dataframe is is an abstract representation of a two-dimensional table which can contain all sorts of data. They also enable us give all the columns names, which is why oftentimes columns are referred to as attributes or fields when using DataFrames. In this article we’ll see how we can rename an already existing DataFrame‘s columns. There are two options for […]

Read more

Python: Get Size of Dictionary

Introduction In this article, we’ll take a look at how to find the size of a dictionary in Python. Dictionary size can mean its length, or space it occupies in memory. To find the number of elements stored in a dictionary we can use the len() function. To find the size of a dictionary in bytes we can use the getsizeof() function of the sys module. To count the elements of a nested dictionary, we can use a recursive function. […]

Read more

Issue #116 – Fully Non-autoregressive Neural Machine Translation

04 Feb21 Issue #116 – Fully Non-autoregressive Neural Machine Translation Author: Dr. Patrik Lambert, Senior Machine Translation Scientist @ Iconic Introduction The standard Transformer model is autoregressive (AT), which means that the prediction of each target word is based on the predictions for the previous words. The output is generated from left to right, a process which cannot be parallelised because the prediction probability of a token depends on previous tokens. In the last few years, new approaches have been […]

Read more

Microsoft Vision Model ResNet-50 combines web-scale data and multi-task learning to achieve state of the art

Microsoft Vision Model ResNet-50 is a state-of-the-art pretrained ResNet-50 model, measured above by the mean average score across seven popular computer vision benchmarks. Pretrained vision models accelerate deep learning research and bring down the cost of performing computer vision tasks in production. By pretraining one large vision model to learn general visual representation of images, then transferring the learning across multiple downstream tasks, a team achieves competitive performance at a fraction of the cost when compared to collecting new  

Read more

Difference Between Backpropagation and Stochastic Gradient Descent

Last Updated on February 1, 2021 There is a lot of confusion for beginners around what algorithm is used to train deep learning neural network models. It is common to hear neural networks learn using the “back-propagation of error” algorithm or “stochastic gradient descent.” Sometimes, either of these algorithms is used as a shorthand for how a neural net is fit on a training dataset, although in many cases, there is a deep confusion as to what these algorithms are, […]

Read more

Weight Initialization for Deep Learning Neural Networks

Weight initialization is an important design choice when developing deep learning neural network models. Historically, weight initialization involved using small random numbers, although over the last decade, more specific heuristics have been developed that use information, such as the type of activation function that is being used and the number of inputs to the node. These more tailored heuristics can result in more effective training of neural network models using the stochastic gradient descent optimization algorithm. In this tutorial, you […]

Read more
1 689 690 691 692 693 919