Pandas DataFrame corr() Method

DataFrame.corr() method in Pandas is used to calculate the correlation between numeric columns in a DataFrame. Correlation shows how strongly two columns are related. The result is returned as a new DataFrame called a correlation matrix, where each value ranges from -1 to 1.

1: perfect positive correlation
-1: perfect negative correlation
0: no correlation

Non-numeric columns are ignored automatically.

Example: This code creates a simple DataFrame and finds the correlation between its columns.

Python

import pandas as pd

d = {"A": [10, 20, 30], "B": [15, 25, 35]}
df = pd.DataFrame(d)

r = df.corr()
print(r)

Output

     A    B
A  1.0  1.0
B  1.0  1.0

Explanation:

df.corr() calculates correlation between all numeric columns.
Column B = Column A + 5, which is a perfect linear relationship and hence the correlation = 1.

Syntax

DataFrame.corr(method='pearson', min_periods=1, numeric_only=False)

Parameters:

method: Correlation method (pearson, spearman, kendall), we get pearson correlation by default.
min_periods: Minimum required matching values
numeric_only: Includes only numeric columns if True

Examples

Example 1: This code finds correlation between height and weight columns with strong positive correlation.

Python

import pandas as pd
d = {"Height": [150, 160, 170, 180], "Weight": [50, 65, 68, 80]}
df = pd.DataFrame(d)
print(df.corr())

Output

          Height    Weight
Height  1.000000  0.973035
Weight  0.973035  1.000000

Explanation:

Correlation is 0.97, which is close to +1 meaning Height and Weight increase together and and have a strong positive relationship, but not perfectly proportional.
Pearson correlation uses covariance of actual values and since most Height and Weight values increase together, covariance is high positive, giving a value close to +1.

Example 2: This code shows negative correlation between two columns using Kendall rank method.

Python

import pandas as pd
d = {"StudyHours": [2, 4, 6, 8], "StressLevel": [8, 7, 5, 6]}
df = pd.DataFrame(d)
print(df.corr(method="kendall"))

Output

StudyHours StressLevel
StudyHours 1.000000 -0.666667
StressLevel -0.666667 1.000000

Explanation:

Correlation is -0.6, which shows a moderate negative relationship, meaning as StudyHours increase, StressLevel generally decreases, but not always.
Kendall correlation counts concordant and discordant rank pairs and since more pairs have opposite order than same order, result is negative, but not all pairs are opposite, so it is -0.6 instead of -1.

Example 3: This code calculates correlation using sperman rank-based method.

Python

import pandas as pd
d = {"MathMarks": [95, 70, 85, 60, 80], "SportsScore": [60, 75, 65, 80, 70]}
df = pd.DataFrame(d)
print(df.corr(method="spearman"))

Output

             MathMarks  SportsScore
MathMarks          1.0         -1.0
SportsScore       -1.0          1.0

Explanation:

The correlation is -1.0, which shows a perfect negative relationship between the two columns.
Spearman correlation uses rank differences and here the rank order is exactly reversed, so rank difference is maximum opposite, resulting in perfect negative correlation (-1).

Pandas DataFrame corr() Method

Syntax

Examples

Explore