Top 15 Open-Source Datasets of 2020 that every Data Scientist Should add to their Portfolio!

Overview Here is a list of Top 15 Datasets for 2020 that we feel every data scientist should practice on The article contains 5 datasets each for machine learning, computer vision, and NLP By no means is this list exhaustive. Feel free to add other datasets in the comments below   Introduction For the things we have to learn before we can do them, we learn by doing them -Aristotle I am sure everyone can attest to this saying. No […]

Read more

Autoencoder Feature Extraction for Classification

Autoencoder is a type of neural network that can be used to learn a compressed representation of raw data. An autoencoder is composed of an encoder and a decoder sub-models. The encoder compresses the input and the decoder attempts to recreate the input from the compressed version provided by the encoder. After training, the encoder model is saved and the decoder is discarded. The encoder can then be used as a data preparation technique to perform feature extraction on raw […]

Read more

Autoencoder Feature Extraction for Regression

Autoencoder is a type of neural network that can be used to learn a compressed representation of raw data. An autoencoder is composed of encoder and a decoder sub-models. The encoder compresses the input and the decoder attempts to recreate the input from the compressed version provided by the encoder. After training, the encoder model is saved and the decoder is discarded. The encoder can then be used as a data preparation technique to perform feature extraction on raw data […]

Read more

A Comprehensive Guide to Understand and Implement Text Classification in Python

Improving Text Classification Models While the above framework can be applied to a number of text classification problems, but to achieve a good accuracy some improvements can be done in the overall framework. For example, following are some tips to improve the performance of text classification models and this framework. 1. Text Cleaning : text cleaning can help to reducue the noise present in text data in the form of stopwords, punctuations marks, suffix variations etc. This article can help to understand how […]

Read more

Top 5 Machine Learning GitHub Repositories & Reddit Discussions (October 2018)

Introduction “Should I use GitHub for my projects?” – I’m often asked this question by aspiring data scientists. There’s only one answer to this – “Absolutely!”. GitHub is an invaluable platform for data scientists looking to stand out from the crowd. It’s an online resume for displaying your code to recruiters and other fellow professionals. The fact that GitHub hosts open-source projects from the top tech behemoths like Google, Facebook, IBM, NVIDIA, etc. is what adds to the gloss of […]

Read more

How do Transformers Work in NLP? A Guide to the Latest State-of-the-Art Models

Overview The Transformer model in NLP has truly changed the way we work with text data Transformer is behind the recent NLP developments, including Google’s BERT Learn how the Transformer idea works, how it’s related to language modeling, sequence-to-sequence modeling, and how it enables Google’s BERT model   Introduction I love being a data scientist working in Natural Language Processing (NLP) right now. The breakthroughs and developments are occurring at an unprecedented pace. From the super-efficient ULMFiT framework to Google’s […]

Read more

How to create a poet / writer using Deep Learning (Text Generation using Python)?

Introduction From short stories to writing 50,000 word novels, machines are churning out words like never before. There are tons of examples available on the web where developers have used machine learning to write pieces of text, and the results range from the absurd to delightfully funny. Thanks to major advancements in the field of Natural Language Processing (NLP), machines are able to understand the context and spin up tales all by themselves.               […]

Read more

The 15 Most Popular Data Science and Machine Learning Articles on Analytics Vidhya in 2018

Introduction What is the one thing you enjoy most about Analytics Vidhya? The most popular answer we receive (and have received since Kunal transformed his idea into reality) is the content we publish. Our content is the one thing take pride in, and 2018 saw us take our high-quality content to a whole new level. We launched multiple top-quality and popular training courses, published knowledge-rich machine learning and deep learning articles and guides, and saw our blog visits cross 2.5 million […]

Read more

8 Excellent Pretrained Models to get you Started with Natural Language Processing (NLP)

Introduction Natural Language Processing (NLP) applications have become ubiquitous these days. I seem to stumble across websites and applications regularly that are leveraging NLP in one form or another. In short, this is a wonderful time to be involved in the NLP domain. This rapid increase in NLP adoption has happened largely thanks to the concept of transfer learning enabled through pretrained models. Transfer learning, in the context of NLP, is essentially the ability to train a model on one dataset […]

Read more

The Top GitHub Repositories & Reddit Threads Every Data Scientist should know (June 2018)

Introduction Half the year has flown by and that brings us to the June edition of our popular series – the top GitHub repositories and Reddit threads from last month. During the course of writing these articles, I have learned so much about machine learning from either open source codes or invaluable discussions among the top data science brains in the world. What makes GitHub special is not just it’s code hosting and social collaboration features for data scientists. It […]

Read more
1 2 3 4 5 6 12