Python in Plain English

Home

About

Follow publication

New Python content every day. Follow to join our 3.5M+ monthly readers.

Follow publication

Relationship Between Data - Covariance & Correlation

Find the relation between the data.

CuriosityDeck

Published in

Python in Plain English

2 min readFeb 2, 2022

Here we are going to discuss how to quantify the relationship between observations. Finding this relationship could help you understand more on data. The methods discussed here will include Covariance, Pearson Correlation Coefficient, and Spearman Rank Correlation.

Covariance:

Covariance tells us the direction of the relationship between two observations. Covariance is positive when the value of one observation increase with other observations. Covariance is negative when the value of one observation decreases while the value of the other observation increases.

Limitations:

Though Covariance calculates the direction of the relationship between two observations, it fails in calculating the strength of the relationship between two observations. This happens due to the fact that the considered observations could be of different units and this causes the magnitude to be different based on their value which is quite useless.

Pearson Correlation Coefficient:

Pearson Correlation coefficient is the ratio of covariance and product of the standard deviation between the observations. It can be considered as the normalized covariance of an observation. The value of PCC always lies in the range [-1, 1]. The values close to 1 means strong positive correlation, values close to -1 means strong negative correlation and 0 indicates random/no relation between the observations.

Limitations:

Correlation between two observations never means causation, meaning one observation is dependent on the other. PCC is good in capturing linear relations between observations but fails in capturing non-linear relations.

Spearman Rank Correlation:

Spearman Rank Correlation can be considered as the PCC of the ranks of the observations. PCC looks for the linearity of the observations but Spearman assesses the monotonic relationship between the observations.

In Spearman Rank Correlation we arrange the values in the observations based on the value independently and check their PCC. This could possibly give the monotonic relation between the observations.

Limitations:

In a population of 1000s of observations, it will be a tedious task to arrange the values based on the observations. There are also a high chance of elements with the same making it more of a difficult task

Causation:

When the values of one observation are dependent on the other observation then it is called causality.

In this blog, we have discussed Covariance, Correlation & Causality. These are used to catch the relation between the observations. This is a major factor in Machine Learning for feature selection and dimensionality reduction.

More content at plainenglish.io. Sign up for our free weekly newsletter. Get exclusive access to writing opportunities and advice in our community Discord.

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Published in Python in Plain English

45K Followers

Last published 7 hours ago

New Python content every day. Follow to join our 3.5M+ monthly readers.

Written by CuriosityDeck

40 Followers

8 Following

No responses yet

Write a response

What are your thoughts?

Also publish to my profile

Recommended from Medium

4 High-Impact Time Series Forecasting Project Ideas

Towards AI

Donato Riccio

4 High-Impact Time Series Forecasting Project Ideas

And no, we’re not talking about predicting the stock market

Dec 25, 2023

Machine Learning Algorithms You Never Knew Existed, But Are Quite Useful

The Pythoneers

Abhay Parashar

Machine Learning Algorithms You Never Knew Existed, But Are Quite Useful

Ever heard of Tsetlin Machines ??

Mar 25

2.1K

20 Advanced Statistical Approaches Every Data Scientist Should Know 🐱‍🚀

Sarowar Jahan Saurav

20 Advanced Statistical Approaches Every Data Scientist Should Know 🐱‍🚀

Data science is a multidisciplinary field that combines mathematics, statistics, computer science, and domain expertise to extract…

Feb 6

1.8K

How to Make Your Data Visualizations Tell a Story

Stephen D. Butalla

How to Make Your Data Visualizations Tell a Story

The Art and Science of Plot Creation in 15 Maxims

Jun 28, 2024

428

Use Excel’s Data Analysis toolpak to create a pristine histogram

Crystal X

Use Excel’s Data Analysis toolpak to create a pristine histogram

A histogram is a graph that shows the frequency of numerical data using rectangles. The height of the rectangle (vertical axis) represents…

Jan 2

Must-Know Python Data Analysis Tools to Learn in 2025

Level Up Coding

Pawel Jastrzebski

Must-Know Python Data Analysis Tools to Learn in 2025

This is a second article in a 3 part series.

Mar 12

See more recommendations

Help
Status
About
Careers
Press
Blog
Privacy
Rules
Terms
Text to speech

Python in Plain English

Relationship Between Data - Covariance & Correlation

Find the relation between the data.

Covariance:

Limitations:

Pearson Correlation Coefficient:

Limitations:

Spearman Rank Correlation:

Limitations:

Causation:

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Published in Python in Plain English

Written by CuriosityDeck

No responses yet

More from CuriosityDeck and Python in Plain English

Quantization Aware Training

Train the model taking quantization into consideration

How I Learned to Love `init.py`: A Simple Guide😊

💡 Heads Up! Click here to unlock this article for free if you’re not a Medium member!

The Enum Trick Every Python Developer Should Know

Learn how Python’s Enum can simplify constants, improve code readability, and add smart behavior to your projects. A must-know trick for…

An Introduction to SORT a Tracking Algoirthm

Introduction

Recommended from Medium

4 High-Impact Time Series Forecasting Project Ideas

And no, we’re not talking about predicting the stock market

Machine Learning Algorithms You Never Knew Existed, But Are Quite Useful

Ever heard of Tsetlin Machines ??

20 Advanced Statistical Approaches Every Data Scientist Should Know 🐱‍🚀

Data science is a multidisciplinary field that combines mathematics, statistics, computer science, and domain expertise to extract…

How to Make Your Data Visualizations Tell a Story

The Art and Science of Plot Creation in 15 Maxims

Use Excel’s Data Analysis toolpak to create a pristine histogram

A histogram is a graph that shows the frequency of numerical data using rectangles. The height of the rectangle (vertical axis) represents…

Must-Know Python Data Analysis Tools to Learn in 2025

This is a second article in a 3 part series.

Python in Plain English

Relationship Between Data - Covariance & Correlation

Find the relation between the data.

Covariance:

Limitations:

Pearson Correlation Coefficient:

Limitations:

Spearman Rank Correlation:

Limitations:

Causation:

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Published in Python in Plain English

Written by CuriosityDeck

No responses yet

More from CuriosityDeck and Python in Plain English

Quantization Aware Training

Train the model taking quantization into consideration

How I Learned to Love `__init__.py`: A Simple Guide😊

💡 Heads Up! Click here to unlock this article for free if you’re not a Medium member!

The Enum Trick Every Python Developer Should Know

Learn how Python’s Enum can simplify constants, improve code readability, and add smart behavior to your projects. A must-know trick for…

An Introduction to SORT a Tracking Algoirthm

Introduction

Recommended from Medium

4 High-Impact Time Series Forecasting Project Ideas

And no, we’re not talking about predicting the stock market

Machine Learning Algorithms You Never Knew Existed, But Are Quite Useful

Ever heard of Tsetlin Machines ??

20 Advanced Statistical Approaches Every Data Scientist Should Know 🐱‍🚀

Data science is a multidisciplinary field that combines mathematics, statistics, computer science, and domain expertise to extract…

How to Make Your Data Visualizations Tell a Story

The Art and Science of Plot Creation in 15 Maxims

Use Excel’s Data Analysis toolpak to create a pristine histogram

A histogram is a graph that shows the frequency of numerical data using rectangles. The height of the rectangle (vertical axis) represents…

Must-Know Python Data Analysis Tools to Learn in 2025

This is a second article in a 3 part series.

How I Learned to Love `init.py`: A Simple Guide😊