Why Excel Users Should Learn Python

Latest update: November 16, 2018 Microsoft Excel has been around for over 30 years now, and chances are it’s not going to change in the foreseeable future. In fact, Excel is facing immense competition from challengers such as Google Spreadsheets and well-funded start-ups like Airtable, which are both going after Excel’s massive user base of approximately 500 million worldwide. Tech-savvy small and mid-sized businesses embrace innovative alternatives to Excel. However, making a dent in the large enterprise space is a […]

Read more

Should Python Become Your Official Corporate Language, Along With English?

English is becoming the official language in the global business world, being currently spoken by approximately 1.75 billion people worldwide according to Harvard Business Review. While English is the fastest spreading language in human history, a significant proportion of businesses are still resistant to giving up on their native language. Just try having a casual conversation in English with German employees at their corporate headquarters canteen (I am German, just for the record). However, pressures are piling up, not only […]

Read more

Mental Framework For A Data Driven Digital Transformation

Over the last years, my small business has undergone a digital transformation from a marketing service company to a data literacy consultancy. What does a data literacy consultancy do? We teach business users within large enterprises to work with data, and we help them acquire the necessary skills from state of the art Excel to Python, querying structured, semi-structured and unstructured databases, as well as math, statistics, and probability. Throughout our transition, we applied a set of techniques, principles, and […]

Read more

Starting to develop in PySpark with Jupyter installed in a Big Data Cluster

Is not a secret that Data Science tools like Jupyter, Apache Zeppelin or the more recently launched Cloud Data Lab and Jupyter Lab are a must be known for the day by day work so How could be combined the power of easily developing models and the capacity of computation of a Big Data Cluster? Well in this article I will share very simple step to start using Jupyter notebooks for PySpark in a Data Proc Cluster in GCP. Final goal Prerequisites 1. Have a Google Cloud […]

Read more

Why I Am Writing At Data Science Central, And Why You Should, Too

My writing engagement at Data Science Central came up unexpectedly. Back in August 2018, I stumbled upon an excellent write-up on Data Science Central. The author, Bill Vorhies, shared his thoughts on career transitioning toward data science. I wrote him an email, complimenting him on his blog post, and I dropped a few lines about my own transition. Here’s his response: “Congratulations on your remarkable journey. Perhaps you’d like to write one or more articles around this theme as we […]

Read more

Your Company Needs A Spreadsheet Policy More Than Ever

Electronic spreadsheets have been around for nearly 40 years now. They were invented by Bob Frankston and Dan Bricklin, founders of VisiCalc, and I had a chance to chat with both gentlemen a couple of months ago. I highly recommend watching this TED talk with Dan Bricklin: It’s important to understand for which purpose electronic spreadsheets were built in the first place if we want to anticipate what their future might look like. In response to my question in how […]

Read more

Measuring dataset similarity using optimal transport

Is FashionMNIST, a dataset of images of clothing items labeled by category, more similar to MNIST or to USPS, both of which are classification datasets of handwritten digits? This is a pretty hard question to answer, but the solution could have an impact on various aspects of machine learning. For example, it could change how practitioners augment a particular dataset to improve the transferring of models across domains or how they select a dataset to pretrain on, especially in scenarios […]

Read more

Claraprint: a chord and melody based fingerprint for western classical music cover detection

Cover song detection has been an active field in the Music Information Retrieval (MIR) community during the past decades. Most of the research community focused in solving it for a wide range of music genres with diverse characteristics… Western classical music, a genre heavily based on the recording of “cover songs”, or musical works, represents a large heritage, offering immediate application for an efficient fingerprint algorithm. We propose an engineering approach for retrieving a cover song from a reference database […]

Read more

Weakly Supervised Learning of Nuanced Frames for Analyzing Polarization in News Media

In this paper we suggest a minimally-supervised approach for identifying nuanced frames in news article coverage of politically divisive topics. We suggest to break the broad policy frames suggested by Boydstun et al., 2014 into fine-grained subframes which can capture differences in political ideology in a better way… We evaluate the suggested subframes and their embedding, learned using minimal supervision, over three topics, namely, immigration, gun-control and abortion. We demonstrate the ability of the subframes to capture ideological differences and […]

Read more

Integration of Clinical Criteria into the Training of Deep Models: Application to Glucose Prediction for Diabetic People

Standard objective functions used during the training of neural-network-based predictive models do not consider clinical criteria, leading to models that are not necessarily clinically acceptable. In this study, we look at this problem from the perspective of the forecasting of future glucose values for diabetic people… In this study, we propose the coherent mean squared glycemic error (gcMSE) loss function. It penalizes the model during its training not only of the prediction errors, but also on the predicted variation errors […]

Read more
1 884 885 886 887 888 912