Apache Kafka + KSQL + TensorFlow for Data Scientists via Python + Jupyter Notebook

Why would a data scientist use Kafka Jupyter Python KSQL TensorFlow all together in a single notebook? There is an impedance mismatch between model development using Python and its Machine Learning tool stack and a scalable, reliable data platform. The former is what you need for quick and easy prototyping to build analytic models. The latter is what you need to use for data ingestion, preprocessing, model deployment and monitoring at scale. It requires low latency, high throughput, zero data […]

Read more

Maximizing Sales with Market Basket Analysis

Sales data analyses can provide a wealth of insights for any business but rarely is it made available to the public. In 2018, however, a retail chain provided Black Friday sales data on Kaggle as part of a Kaggle competition. Although the store and product lines are anonymized, the dataset presents a great learning opportunity to find business insights! In this post, we’ll cover how to prepare data, perform basic analysis, and glean additional insights via a technique called Market […]

Read more

How Can Python Help Solve Machine Learning Challenges?

Summary: Python’s open-source and high-level nature, as well as its comprehensive libraries, make it the perfect fit to solve the numerous real-life ML challenges. The increasing popularity and accessibility of Artificial Intelligence solutions is rapidly reshaping many industries, from healthcare through finance to aviation. Although the application of the latest technologies has always been an essential consideration for companies striving to get ahead of the curve, the ubiquity of AI means that it’s becoming the core of many operations. But […]

Read more

How to make you own Wiki from Wikipedia using Python

Here is a short blog I was asked to make about making a personal Wiki from Wikipedia. It shows the basic steps in text processing so I hope it will be useful for data scientists. It also requires some knowledge of MediaWiki setup on a web server, and some (not very advanced) knowledge of the Python programming language. It takes only several days to create this Wiki with Wikipedia articles if you know Python and basic ideas of data science. […]

Read more

Visually Explained: Three Excel Core-Features Even Excel-Pros Don’t Know

Over the last few years, Excel has been redesigned from the ground up. Currently, Microsoft is making the new Excel core-features available to every user, regardless of your Office 365 license. Thanks to the Microsoft naming conventions, it is easy to confuse the new features with existing ones. That being said, Power Query and Power Pivot are not the same things as Pivot Tables, which you have likely been using for years. Power Query (M-Language)Data preparation is very time-consuming. Power […]

Read more

Visually Explained: How Can Executives Grasp What Programming Is All About?

Quite often, non-technical executives have difficulties understanding what programming, on a very fundamental level, is all about. Because of that knowledge-gap, they tend to hire and overburden experienced data professionals with tasks which they are hopelessly overqualified for. Such as, for example, doing ad-hoc SQL queries on CRM data: “You’re the go-to-guy for all things data, and we need the results for the board meeting tomorrow.” That’s a quite humbling and frustrating experience for anyone who calls himself a Data […]

Read more

Python Programming Fundamentals: A Beginner’s Guide [Updated 2020]

Python is one of the powerful, high-level, easy to learn programming language that provides a huge number of applications. Some of its features, such as being object-oriented and open source, having numerous IDE’s, etc. make it one of the most in-demand programming languages of the present IT industry. According to TIOBE index, as of January 2020, Python is one of the popular programming languages. By looking at the popularity of this programming language, many IT professionals, both beginners as well as experienced alike, […]

Read more

Training with historical data! Surely, you’re joking says the IoT asset that just got connected

By Priya Sharma – Sr. Data Scientist -IoT Analytics, SAS Institute Inc. Saurabh Mishra – Product Management, IoT, SAS Institute Inc. June 12, 2020 Description: Majority of AI approaches are based on the construct of training against historical data and then inferencing new data. While this is a sound and proven approach, a lot of IoT assets coming online don’t have historical data and we don’t necessarily have the time to wait. Modern Machine Learning methods can be employed to […]

Read more

FlashText – A library faster than Regular Expressions for NLP tasks

People like me working in the field of Natural Language Processing almost always come across the task of replacing words in a text. The reasons behind replacing the words may be different. Some of them are. “would’ve” and “would have” represent the same thing. So changing all the occurrences of “would’ve” to “would have” is one such task. Changing all Case Variations to a single form i.e Python, pytHon, pYthon, pythoN etc. to python Changing all the synonyms of a word to […]

Read more

25 Open Datasets for Deep Learning Every Data Scientist Must Work With

Introduction The key to getting better at deep learning (or most fields in life) is practice. Practice on a variety of problems – from image processing to speech recognition. Each of these problem has it’s own unique nuance and approach. But where can you get this data? A lot of research papers you see these days use proprietary datasets that are usually not released to the general public. This becomes a problem, if you want to learn and apply your […]

Read more
1 744 745 746 747 748 906