Relationship Between Data - Covariance & Correlation
Find the relation between the data.

Here we are going to discuss how to quantify the relationship between observations. Finding this relationship could help you understand more on data. The methods discussed here will include Covariance, Pearson Correlation Coefficient, and Spearman Rank Correlation.
Covariance:
Covariance tells us the direction of the relationship between two observations. Covariance is positive when the value of one observation increase with other observations. Covariance is negative when the value of one observation decreases while the value of the other observation increases.
Limitations:
Though Covariance calculates the direction of the relationship between two observations, it fails in calculating the strength of the relationship between two observations. This happens due to the fact that the considered observations could be of different units and this causes the magnitude to be different based on their value which is quite useless.
Pearson Correlation Coefficient:
Pearson Correlation coefficient is the ratio of covariance and product of the standard deviation between the observations. It can be considered as the normalized covariance of an observation. The value of PCC always lies in the range [-1, 1]. The values close to 1 means strong positive correlation, values close to -1 means strong negative correlation and 0 indicates random/no relation between the observations.
Limitations:
Correlation between two observations never means causation, meaning one observation is dependent on the other. PCC is good in capturing linear relations between observations but fails in capturing non-linear relations.
Spearman Rank Correlation:
Spearman Rank Correlation can be considered as the PCC of the ranks of the observations. PCC looks for the linearity of the observations but Spearman assesses the monotonic relationship between the observations.
In Spearman Rank Correlation we arrange the values in the observations based on the value independently and check their PCC. This could possibly give the monotonic relation between the observations.
Limitations:
In a population of 1000s of observations, it will be a tedious task to arrange the values based on the observations. There are also a high chance of elements with the same making it more of a difficult task
Causation:
When the values of one observation are dependent on the other observation then it is called causality.
In this blog, we have discussed Covariance, Correlation & Causality. These are used to catch the relation between the observations. This is a major factor in Machine Learning for feature selection and dimensionality reduction.
More content at plainenglish.io. Sign up for our free weekly newsletter. Get exclusive access to writing opportunities and advice in our community Discord.