
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Pearson Correlation Test Between Two Variables in Python
The Pearson Correlation Test is a simple statistical method in Python that measures the relationship between two parameter variables. It is useful to measure the relationship between two variables graphically, so you can know how strong the relationship is between the variables and whether they are related or not. To find Pearson Correlation we can use pearsonr() function.
Its value falls between -1 and 1, with -1 being a perfect negative correlation, 0 representing no relationship, and 1 representing a perfect positive correlation.
Syntax
This syntax is used in all the following examples.
pearsonr(variable1,variable2)
Algorithm
Step 1 ? Import the module and libraries.
Step 2 ? Define the variables or datasets.
var1=[ ] var2=[ ] or If you want to perform on csv file then df = pd.read_csv("file_name.csv")
Step 3 ? Apply the pearsonr() function for calculating the Correlation test.
Step 4 ? Now print the result.
Method 1: Here we are using variables to find the Correlation
Example 1
Finding Pearson Correlation Test Between Two Variables.
from scipy.stats import pearsonr var1 = [2, 4, 6, 8] #1st variable var2 = [1, 3, 5, 7] #2nd variable # find Pearson correlation correlation,_ = pearsonr(var1, var2) print('Pearson correlation:', correlation)
Output
Pearson correlation: 1.0
In this code, the ?pearsonr' function is imported from ?scipy.stats'. Two lists named var1 and var2 are created. Using the ?pearsonr()' function, Pearson correlation between var1 and var2 is calculated. For this, ?pearsonr()' function is passed along with var1 and var2. The value of Pearson's correlation is stored in correlation.Then, the Pearson correlation will be printed.
Example 2
Finding Pearson Correlation Test Between Two Variables.
from scipy.stats import pearsonr var1 = [2.2, 4.6, 6.8, 7.8] #1st variable var2 = [1.3, 3.2, 5.6, 9.7] #2nd variable # find Pearson correlation correlation,_ = pearsonr(var1, var2) print('Pearson correlation:', correlation)
Output
Pearson correlation: 0.9385130127002226
In this code, the ?pearsonr' function is imported from ?scipy.stats'. Here we are creating two decimal lists named var1 and var2. Using the ?pearsonr()' function, Pearson correlation between var1 and var2 is calculated. For this, ?pearsonr()' function is passed along with var1 and var2. The value of Pearson's correlation is stored in correlation.Then, the Pearson correlation will be printed.
Example 3
Finding Pearson Correlation Test Between Two Variables.
from scipy.stats import pearsonr var1 = [-2, -5, -1, -7] #1st variable var2 = [-8, -3, -6, -9] #2nd variable # find Pearson correlation correlation,_ = pearsonr(var1, var2) print('Pearson correlation:', correlation)
Output
Pearson correlation: 0.11437725271791938
In this code, the ?pearsonr' function is imported from ?scipy.stats'. Here we are creating two lists with negative elements(var1 and var2). Using the ?pearsonr()' function, Pearson correlation between var1 and var2 is calculated. For this, ?pearsonr()' function is passed along with var1 and var2. The value of Pearson's correlation is stored in correlation.Then, the Pearson correlation will be printed.
Example 4
Finding Pearson Correlation Test Between Two Variables.
from scipy.stats import pearsonr var1 = [-2, 5, -1, -7] #1st variable var2 = [-4, -3, -6, 2] #2nd variable # find Pearson correlation correlation,_ = pearsonr(var1, var2) print('Pearson correlation:', correlation)
Output
Pearson correlation: -0.5717997297136825
Method 2: Here we are using datasets to find the Correlation
Example 1
Finding Pearson Correlation Test from given datasets.
You can download the csv file from here - student_data
import pandas as pd from scipy.stats import pearsonr #adding datasets df = pd.read_csv("student_clustering.csv") # Convert dataframe into series column1 = df['cgpa'] column2 = df['iq'] # find Pearson correlation correlation,_ = pearsonr(column1, column2) print('Pearson correlation:', correlation)
Output
Pearson correlation: 0.5353007092636304 #This value indicates a average or intermediate relationship between variables.
In this code, first we have access to the dataset(student_clustering.csv) from the source path. Then we fetch the numeric column with the same length from the dataset. Now we apply the Pearson correlation function and find the correlation value.
Example 2
Finding Pearson Correlation Test from given datasets.
You can download the csv file from here - cardata
import pandas as pd from scipy.stats import pearsonr #adding datasets df = pd.read_csv("cardata.csv") # Convert dataframe into series column1 = df['Selling_Price'] column2 = df['Present_Price'] # find Pearson correlation correlation,_ = pearsonr(column1, column2) print('Pearson correlation:', correlation)
Output
Pearson correlation: 0.8252819190808663 #This value indicates a strong relationship between variables because it's near by 1.
In this code, first we have access to the dataset(cardata.csv) from the source path. Then we fetch the numeric column with the same length from the dataset. Now we apply the Pearson correlation function and find the correlation value.
Conclusion
In conclusion, the Pearson Correlation Test is a crucial tool for anyone working with data who wants to understand patterns and correlations. You can easily run this test and learn important details about the pattern and value of the connection between two variables by using Python and the scipy library.