Do Not Use Random Guessing As Your Baseline Classifier
Last Updated on September 25, 2019
I recently received the following question via email:
Hi Jason, quick question. A case of class imbalance: 90 cases of thumbs up 10 cases of thumbs down. How would we calculate random guessing accuracy in this case?
We can answer this question using some basic probability (I opened excel and typed in some numbers).
Kick-start your project with my new book Probability for Machine Learning, including step-by-step tutorials and the Python source code files for all examples.
Let’s get started.
Note, for a more detailed tutorial on this topic, see:
Let’s say the split is 90%-10% for class 0 and class 1. Let’s also say that you will guess randomly using the same ratio.
The theoretical accuracy of random guessing on a two-classification problem is: