Pearson Correlation Test Between Two Variables in Python

Python Server Side Programming Programming

The Pearson Correlation Test is a simple statistical method in Python that measures the relationship between two parameter variables. It is useful to measure the relationship between two variables graphically, so you can know how strong the relationship is between the variables and whether they are related or not. To find Pearson Correlation we can use pearsonr() function.

Its value falls between -1 and 1, with -1 being a perfect negative correlation, 0 representing no relationship, and 1 representing a perfect positive correlation.

Syntax

This syntax is used in all the following examples.

pearsonr(variable1,variable2)

Algorithm

Step 1 ? Import the module and libraries.
Step 2 ? Define the variables or datasets.

var1=[ ]
var2=[ ] or
If you want to perform on csv file then
   df = pd.read_csv("file_name.csv")

Step 3 ? Apply the pearsonr() function for calculating the Correlation test.
Step 4 ? Now print the result.

Method 1: Here we are using variables to find the Correlation

Example 1

Finding Pearson Correlation Test Between Two Variables.

from scipy.stats import pearsonr

var1 = [2, 4, 6, 8]   #1st variable
var2 = [1, 3, 5, 7]   #2nd variable

# find Pearson correlation 
correlation,_ = pearsonr(var1, var2)

print('Pearson correlation:', correlation)

Output

Pearson correlation: 1.0

In this code, the ?pearsonr' function is imported from ?scipy.stats'. Two lists named var1 and var2 are created. Using the ?pearsonr()' function, Pearson correlation between var1 and var2 is calculated. For this, ?pearsonr()' function is passed along with var1 and var2. The value of Pearson's correlation is stored in correlation.Then, the Pearson correlation will be printed.

Example 2

Finding Pearson Correlation Test Between Two Variables.

from scipy.stats import pearsonr

var1 = [2.2, 4.6, 6.8, 7.8]   #1st variable
var2 = [1.3, 3.2, 5.6, 9.7]   #2nd variable

# find Pearson correlation 
correlation,_ = pearsonr(var1, var2)

print('Pearson correlation:', correlation)

Output

Pearson correlation: 0.9385130127002226

In this code, the ?pearsonr' function is imported from ?scipy.stats'. Here we are creating two decimal lists named var1 and var2. Using the ?pearsonr()' function, Pearson correlation between var1 and var2 is calculated. For this, ?pearsonr()' function is passed along with var1 and var2. The value of Pearson's correlation is stored in correlation.Then, the Pearson correlation will be printed.

Example 3

Finding Pearson Correlation Test Between Two Variables.

from scipy.stats import pearsonr

var1 = [-2, -5, -1, -7]   #1st variable
var2 = [-8, -3, -6, -9]   #2nd variable

# find Pearson correlation 
correlation,_ = pearsonr(var1, var2)

print('Pearson correlation:', correlation)

Output

Pearson correlation: 0.11437725271791938

In this code, the ?pearsonr' function is imported from ?scipy.stats'. Here we are creating two lists with negative elements(var1 and var2). Using the ?pearsonr()' function, Pearson correlation between var1 and var2 is calculated. For this, ?pearsonr()' function is passed along with var1 and var2. The value of Pearson's correlation is stored in correlation.Then, the Pearson correlation will be printed.

Example 4

Finding Pearson Correlation Test Between Two Variables.

from scipy.stats import pearsonr

var1 = [-2, 5, -1, -7]   #1st variable
var2 = [-4, -3, -6, 2]   #2nd variable

# find Pearson correlation 
correlation,_ = pearsonr(var1, var2)

print('Pearson correlation:', correlation)

Output

Pearson correlation: -0.5717997297136825

Method 2: Here we are using datasets to find the Correlation

Example 1

Finding Pearson Correlation Test from given datasets.

You can download the csv file from here - student_data

import pandas as pd
from scipy.stats import pearsonr
#adding datasets
df = pd.read_csv("student_clustering.csv")

# Convert dataframe into series
column1 = df['cgpa']
column2 = df['iq']

# find Pearson correlation 
correlation,_ = pearsonr(column1, column2)

print('Pearson correlation:', correlation)

Output

Pearson correlation: 0.5353007092636304  

#This value indicates a average or intermediate relationship between variables.

In this code, first we have access to the dataset(student_clustering.csv) from the source path. Then we fetch the numeric column with the same length from the dataset. Now we apply the Pearson correlation function and find the correlation value.

Example 2

Finding Pearson Correlation Test from given datasets.

You can download the csv file from here - cardata

import pandas as pd
from scipy.stats import pearsonr

#adding datasets
df = pd.read_csv("cardata.csv")

# Convert dataframe into series
column1 = df['Selling_Price']
column2 = df['Present_Price']

# find Pearson correlation 
correlation,_ = pearsonr(column1, column2)

print('Pearson correlation:', correlation)

Output

Pearson correlation: 0.8252819190808663  

#This value indicates a strong relationship between variables because it's near by 1.

In this code, first we have access to the dataset(cardata.csv) from the source path. Then we fetch the numeric column with the same length from the dataset. Now we apply the Pearson correlation function and find the correlation value.

Conclusion

In conclusion, the Pearson Correlation Test is a crucial tool for anyone working with data who wants to understand patterns and correlations. You can easily run this test and learn important details about the pattern and value of the connection between two variables by using Python and the scipy library.

Vishal Gupta

Updated on: 2023-09-29T14:54:46+05:30

400 Views

Kickstart Your Career

Get certified by completing the course

Get Started