DataFrame.corr() method in Pandas is used to calculate the correlation between numeric columns in a DataFrame. Correlation shows how strongly two columns are related. The result is returned as a new DataFrame called a correlation matrix, where each value ranges from -1 to 1.
- 1: perfect positive correlation
- -1: perfect negative correlation
- 0: no correlation
Non-numeric columns are ignored automatically.
Example: This code creates a simple DataFrame and finds the correlation between its columns.
import pandas as pd
d = {"A": [10, 20, 30], "B": [15, 25, 35]}
df = pd.DataFrame(d)
r = df.corr()
print(r)
Output
A B A 1.0 1.0 B 1.0 1.0
Explanation:
- df.corr() calculates correlation between all numeric columns.
- Column B = Column A + 5, which is a perfect linear relationship and hence the correlation = 1.
Syntax
DataFrame.corr(method='pearson', min_periods=1, numeric_only=False)
Parameters:
- method: Correlation method (pearson, spearman, kendall), we get pearson correlation by default.
- min_periods: Minimum required matching values
- numeric_only: Includes only numeric columns if True
Examples
Example 1: This code finds correlation between height and weight columns with strong positive correlation.
import pandas as pd
d = {"Height": [150, 160, 170, 180], "Weight": [50, 65, 68, 80]}
df = pd.DataFrame(d)
print(df.corr())
Output
Height Weight Height 1.000000 0.973035 Weight 0.973035 1.000000
Explanation:
- Correlation is 0.97, which is close to +1 meaning Height and Weight increase together and and have a strong positive relationship, but not perfectly proportional.
- Pearson correlation uses covariance of actual values and since most Height and Weight values increase together, covariance is high positive, giving a value close to +1.
Example 2: This code shows negative correlation between two columns using Kendall rank method.
import pandas as pd
d = {"StudyHours": [2, 4, 6, 8], "StressLevel": [8, 7, 5, 6]}
df = pd.DataFrame(d)
print(df.corr(method="kendall"))
Output
StudyHours StressLevel
StudyHours 1.000000 -0.666667
StressLevel -0.666667 1.000000
Explanation:
- Correlation is -0.6, which shows a moderate negative relationship, meaning as StudyHours increase, StressLevel generally decreases, but not always.
- Kendall correlation counts concordant and discordant rank pairs and since more pairs have opposite order than same order, result is negative, but not all pairs are opposite, so it is -0.6 instead of -1.
Example 3: This code calculates correlation using sperman rank-based method.
import pandas as pd
d = {"MathMarks": [95, 70, 85, 60, 80], "SportsScore": [60, 75, 65, 80, 70]}
df = pd.DataFrame(d)
print(df.corr(method="spearman"))
Output
MathMarks SportsScore MathMarks 1.0 -1.0 SportsScore -1.0 1.0
Explanation:
- The correlation is -1.0, which shows a perfect negative relationship between the two columns.
- Spearman correlation uses rank differences and here the rank order is exactly reversed, so rank difference is maximum opposite, resulting in perfect negative correlation (-1).