How to Execute R and Python in SQL Server with Machine Learning Services

Introduction Did you know that you can write R and Python code within your T-SQL statements? Machine Learning Services   in SQLServer eliminates the need for data movement. Instead of transferring large and sensitive data over the network or losing accuracy with sample csv files, you can have your R/Python code execute within your database. Easily deploy your R/Python code with SQL stored procedures making them accessible in your ETL processes or to any application. Train and store machine learning models […]

Read more

PixieDust Support of Streaming Data

With the rise of IoT devices (Internet of Things), being able to analyze and visualize live streams of data is becoming more and more important. For example, you could have sensors like thermometers in machines or portable medical devices like pacemakers, continuously streaming data to a streaming service like Kafka. PixieDust makes it easier to work with live data inside Jupyter Notebooks by providing simple integration APIs to both the PixieApp and display() framework.   On the visualization level, PixieDust […]

Read more

Career Transition Towards Data Science: Planning a Learning Sabbatical

At the time of writing this post, I am nine months into my learning sabbatical. You can read about my journey here: “Career Transition Towards Data Analytics & Science”. Today I will share with you how you can plan your own, unique learning sabbatical, regardless of its scope and duration – anywhere between 1 and 12 months. Let’s get started. Begin with the end in mind If you have ever read Stephen Covey’s “7 Habits of Highly Effective People” you […]

Read more

Why Excel Users Should Learn Python

Latest update: November 16, 2018 Microsoft Excel has been around for over 30 years now, and chances are it’s not going to change in the foreseeable future. In fact, Excel is facing immense competition from challengers such as Google Spreadsheets and well-funded start-ups like Airtable, which are both going after Excel’s massive user base of approximately 500 million worldwide. Tech-savvy small and mid-sized businesses embrace innovative alternatives to Excel. However, making a dent in the large enterprise space is a […]

Read more

Should Python Become Your Official Corporate Language, Along With English?

English is becoming the official language in the global business world, being currently spoken by approximately 1.75 billion people worldwide according to Harvard Business Review. While English is the fastest spreading language in human history, a significant proportion of businesses are still resistant to giving up on their native language. Just try having a casual conversation in English with German employees at their corporate headquarters canteen (I am German, just for the record). However, pressures are piling up, not only […]

Read more

Mental Framework For A Data Driven Digital Transformation

Over the last years, my small business has undergone a digital transformation from a marketing service company to a data literacy consultancy. What does a data literacy consultancy do? We teach business users within large enterprises to work with data, and we help them acquire the necessary skills from state of the art Excel to Python, querying structured, semi-structured and unstructured databases, as well as math, statistics, and probability. Throughout our transition, we applied a set of techniques, principles, and […]

Read more

Starting to develop in PySpark with Jupyter installed in a Big Data Cluster

Is not a secret that Data Science tools like Jupyter, Apache Zeppelin or the more recently launched Cloud Data Lab and Jupyter Lab are a must be known for the day by day work so How could be combined the power of easily developing models and the capacity of computation of a Big Data Cluster? Well in this article I will share very simple step to start using Jupyter notebooks for PySpark in a Data Proc Cluster in GCP. Final goal Prerequisites 1. Have a Google Cloud […]

Read more

Why I Am Writing At Data Science Central, And Why You Should, Too

My writing engagement at Data Science Central came up unexpectedly. Back in August 2018, I stumbled upon an excellent write-up on Data Science Central. The author, Bill Vorhies, shared his thoughts on career transitioning toward data science. I wrote him an email, complimenting him on his blog post, and I dropped a few lines about my own transition. Here’s his response: “Congratulations on your remarkable journey. Perhaps you’d like to write one or more articles around this theme as we […]

Read more

Your Company Needs A Spreadsheet Policy More Than Ever

Electronic spreadsheets have been around for nearly 40 years now. They were invented by Bob Frankston and Dan Bricklin, founders of VisiCalc, and I had a chance to chat with both gentlemen a couple of months ago. I highly recommend watching this TED talk with Dan Bricklin: It’s important to understand for which purpose electronic spreadsheets were built in the first place if we want to anticipate what their future might look like. In response to my question in how […]

Read more

Measuring dataset similarity using optimal transport

Is FashionMNIST, a dataset of images of clothing items labeled by category, more similar to MNIST or to USPS, both of which are classification datasets of handwritten digits? This is a pretty hard question to answer, but the solution could have an impact on various aspects of machine learning. For example, it could change how practitioners augment a particular dataset to improve the transferring of models across domains or how they select a dataset to pretrain on, especially in scenarios […]

Read more
1 873 874 875 876 877 901