Step by step guide to building sentiment analysis model using graphlab

I have been using graph lab for quite some time now. The first Kaggle competition I used it for was Click Trough Rate (CTR) and I was amazed to see the speed at which it can crunch such big data. Over last few months, I have realised much broader applications of GraphLab. In this article I will take up the text mining capability of GraphLab and solve one of the Kaggle problems. I will be referring to this problem with […]

Read more

A Comprehensive Guide to Understand and Implement Text Classification in Python

Improving Text Classification Models While the above framework can be applied to a number of text classification problems, but to achieve a good accuracy some improvements can be done in the overall framework. For example, following are some tips to improve the performance of text classification models and this framework. 1. Text Cleaning : text cleaning can help to reducue the noise present in text data in the form of stopwords, punctuations marks, suffix variations etc. This article can help to understand how […]

Read more

Top 5 Machine Learning GitHub Repositories & Reddit Discussions (October 2018)

Introduction “Should I use GitHub for my projects?” – I’m often asked this question by aspiring data scientists. There’s only one answer to this – “Absolutely!”. GitHub is an invaluable platform for data scientists looking to stand out from the crowd. It’s an online resume for displaying your code to recruiters and other fellow professionals. The fact that GitHub hosts open-source projects from the top tech behemoths like Google, Facebook, IBM, NVIDIA, etc. is what adds to the gloss of […]

Read more

The Ultimate Learning Path to Become a Data Scientist and Master Machine Learning in 2019

The Learning Path to Become a Data Scientist in 2020 is now live! Head over here to start your data science journey. Introduction Learning paths are immensely popular among our readers and with good reason! Learning paths take away the pain and confusion from the learning process. For those who don’t know what a learning path is – we take the pain of going through all the resources available on data science, machine learning and Artificial Intelligence, select the best […]

Read more

Steps for effective text data cleaning (with case study using Python)

Introduction   The days when one would get data in tabulated spreadsheets are truly behind us. A moment of silence for the data residing in the spreadsheet pockets. Today, more than 80% of the data is unstructured – it is either present in data silos or scattered around the digital archives. Data is being produced as we speak – from every conversation we make in the social media to every content generated from news sources. In order to produce any […]

Read more

Beginners Guide to Topic Modeling in Python

Introduction Analytics Industry is all about obtaining the “Information” from the data. With the growing amount of data in recent years, that too mostly unstructured, it’s difficult to obtain the relevant and desired information. But, technology has developed some powerful methods which can be used to mine through the data and fetch the information that we are looking for. One such technique in the field of text mining is Topic Modelling. As the name suggests, it is a process to […]

Read more

The Top GitHub Repositories & Reddit Threads Every Data Scientist should know (June 2018)

Introduction Half the year has flown by and that brings us to the June edition of our popular series – the top GitHub repositories and Reddit threads from last month. During the course of writing these articles, I have learned so much about machine learning from either open source codes or invaluable discussions among the top data science brains in the world. What makes GitHub special is not just it’s code hosting and social collaboration features for data scientists. It […]

Read more

The 25 Best Data Science and Machine Learning GitHub Repositories from 2018

Introduction What’s the best platform for hosting your code, collaborating with team members, and also acts as an online resume to showcase your coding skills? Ask any data scientist, and they’ll point you towards GitHub. It has been a truly revolutionary platform in recent years and has changed the landscape of how we host and even do coding. But that’s not all. It acts as a learning tool as well. How, you ask? I’ll give you a hint – open […]

Read more

The Ultimate Learning Path to Becoming a Data Scientist in 2018

Introduction So you’ve taken the plunge. You want to become a data scientist. But where to begin? There are far too many resources out there. How do you decide the starting point? Did you miss out on topics you should have studied? Which are the best resources to learn? Don’t worry, we have you covered! Analytics Vidhya’s learning path for 2016 saw 250,000+ views. In 2017, we went even further and saw an incredible 500,000+ views! So this year, we […]

Read more

11 Superb Data Science Videos Every Data Scientist Must Watch

Overview Presenting 11 data science videos that will enhance and expand your current skillset We have categorized these videos into three fields – Natural Language Processing (NLP), Generative Models, and Reinforcement Learning Learn how the concepts in these videos work and build your own data science project!   Introduction I love learning and understanding data science concepts through videos. I simply do not have the time to pour through books and pages of text to understand different ideas and topics. […]

Read more
1 2 3 4 5