How to Calculate Correlation Between Variables in Python
Last Updated on August 20, 2020
There may be complex and unknown relationships between the variables in your dataset.
It is important to discover and quantify the degree to which variables in your dataset are dependent upon each other. This knowledge can help you better prepare your data to meet the expectations of machine learning algorithms, such as linear regression, whose performance will degrade with the presence of these interdependencies.
In this tutorial, you will discover that correlation is the statistical summary of the relationship between variables and how to calculate it for different types variables and relationships.
After completing this tutorial, you will know:
- How to calculate a covariance matrix to summarize the linear relationship between two or more variables.
- How to calculate the Pearson’s correlation coefficient to summarize the linear relationship between two variables.
- How to calculate the Spearman’s correlation coefficient to summarize the monotonic relationship between two variables.
Kick-start your project with my new book Statistics for Machine Learning, including step-by-step tutorials and the Python source code files for all examples.
Let’s get started.
- Update May/2018: Updated description of the sign of the covariance (thanks Fulya).