0% found this document useful (0 votes)

19 views8 pages

Correlation

Correlation is a bivariate statistical measure that quantifies the degree of association between two variables, represented by the correlation coefficient. It can be classified into types such as positive, negative, linear, non-linear, simple, and multiple correlation. The document also discusses methods for studying correlation, including scatter diagrams and Pearson's and Spearman's correlation coefficients, along with significance testing for the correlation coefficient.

Uploaded by

Prithvish Mohanty

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views8 pages

Correlation

Uploaded by

Prithvish Mohanty

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Correlation

(Association between Variables)

Correlation is one type of bivariate statistics, a measure of association, which provides the types
of correlation and also degree of correlation between two variables in the sense that the changes
in the values of one variable associated with the changes in the values of the other variable.

Measurement of correlation between two variables is defined in a single number called the
correlation coefficient. This is based on the association between rankings of two variables, or
the association of distances between values of the variable which provides a detailed idea of the
nature of association between variables.

❖ Types of Correlation:
• Positive and negative correlation

Correlation is positive means relationship between the two variables is such that the values of
the two variables move in the same direction so that with increases (decreases) in one of the
variables being associated with increases (decreases) in the other variable.

Correlation is negative means relationship between the two variables is such that the values of
the two variables move in the opposite direction so that with increases (decreases) in one of the
variables being associated with decreases (increases) in the other variable.
• Linear and non-linear correlation

• Simple and Multiple Correlation

Simple Correlation - When we consider only two variables and check the correlation between
them it is said to be Simple Correlation.

Multiple Correlation - When we consider three or more variables for correlation simultaneously,
it is termed as Multiple Correlation. For example, if we study the relationship between poverty,
infant mortality and education, it is a problem of multiple correlations.

❖ Correlation coefficient

The correlation coefficient (𝑟) is a summary measure that describes the extent (degree/strength)
of the association between two ordinal, interval or ratio level variables. The correlation
coefficient is scaled so that it is always between −1 and +1. A correlation coefficient close to 1
means a positive relationship, and close to −1 indicates a negative relationship between the two
variables. A correlation coefficient close to 0, but either positive or negative implies little or no
relationship between the two variables.
For interval or ratio level scales, the most commonly used correlation coefficient is Pearson’s 𝑟.

For ordinal scales, the correlation coefficient which is usually calculated is Spearman’s rho.

❖ Methods of studying Correlation

• Scatter Diagram

Scatter diagram, which is graphing combinations of two variables, is the simplest method of
studying correlation between two variables. This diagram shows the values of two variables X
and Y, along with the way in which these two variables relate to each other. However, in this
method we cannot measure the exact degree of correlation between the variables.

For purposes of drawing a scatter diagram, and determining the correlation coefficient, it does
not matter which of the two variables is the X variable, and which is Y. Correlation methods are
symmetric with respect to the two variables, with no indication of causation or direction of
influence.

Example: Consider a salary data stored in Salary_Data.csv containing ‘Years of experience’ (say
variable X) and ‘Salary’ (say variable Y). Looking the following scatter diagram for this data,
the relationship between X and Y can be seen at a glance. This diagram indicates a generally
positive correlation between X and Y. It can be seen that larger (smaller) values of X are
associated with larger (smaller) values of Y and vice versa. That is, as the number of years of
experiences increases (decreases), generally the salary also increases (decreases).
• Karl Pearson’s co-efficient of correlation (Pearson's 𝒓)

The Pearson product-moment correlation coefficient, known as the Pearson’s 𝒓, is widely

used correlation coefficient. Pearson's 𝑟 summarizes the relationship between two variables
that have a straight line or linear relationship with each other.

Suppose that there are two variables X and Y, and 𝑥1 , 𝑥2 , … , 𝑥𝑛 and 𝑦1 , 𝑦2 , … , 𝑦𝑛 respectively
are 𝑛 observations (values) of the two variables respectively. Let the mean of X be 𝑥̅ and the
mean of Y be 𝑦̅. Pearson's 𝑟 is defined as follows:

∑𝑛𝑖=1(𝑥𝑖 − 𝑥̅ )(𝑦𝑖 − 𝑦̅)

𝑟 = 𝐶𝑜𝑟(𝑋, 𝑌) =
√∑𝑛𝑖=1(𝑥𝑖 − 𝑥̅ )2 √∑𝑛𝑖=1(𝑦𝑖 − 𝑦̅)2

𝐶𝑜𝑣(𝑋, 𝑌) 𝐸[(𝑋 − 𝑚𝑋 )(𝑌 − 𝑚𝑌 )] 𝐸(𝑋𝑌) − 𝐸(𝑋)𝐸(𝑌)

𝐶𝑜𝑟(𝑋, 𝑌) = = =
𝑠. 𝑑. (𝑋) 𝑠. 𝑑. (𝑌) √𝑉𝑎𝑟(𝑋)√𝑉𝑎𝑟(𝑌) 2 2
√𝐸(𝑋 2 ) − (𝐸(𝑋)) √𝐸(𝑌 2 ) − (𝐸(𝑌))
[ ]

Or equivalently,

𝑛 ∑ 𝑥𝑖 𝑦𝑖 − (∑ 𝑥𝑖 )(∑ 𝑦𝑖 )
𝑟=
√𝑛 ∑ 𝑥𝑖2 − (∑ 𝑥𝑖 )2 √𝑛 ∑ 𝑦𝑖2 − (∑ 𝑦𝑖 )2

Interpretation of 𝒓:

The value of the correlation coefficient always lie between −1 and +1, i.e., −1 ≤ 𝑟 ≤ 1. Here
𝑟 = +1 means a perfect positive correlation between the variables X and Y, whereas 𝑟 = −1,
means a perfect negative correlation between the variables. When 𝑟 = 0, there is no relationship
between the two variables.

Value of 𝑟 close to +1 indicates strong positive correlation (the values of X and Y are strongly
positively associated), 𝑟 close to −1 indicates that the values of X and Y are strongly negatively
associated, and 𝑟 close to 0, either on the positive or the negative side, means there is little
association between X and Y.

(Note: While calculating 𝑟 we keep outliers either to a minimum or remove them entirely).

For example, for the salary data stored in Salary_Data.csv, correlation coefficient between
‘Years of experience’ (say variable X) and ‘Salary’ (say variable Y) is 𝑟 = 0.978. So the
variable X and Y have strong positive correlation.
Test of Significance for 𝒓:

The sample data are used to compute 𝑟, i.e. the correlation coefficient for the sample. The
reliability of the linear model also depends on how many observed data points are in the sample.
If we had data for the entire population, we could find the population correlation coefficient. So
the sample correlation coefficient (𝒓) is our estimate of the unknown population correlation
coefficient.

Let us denote the true correlation coefficient that would be observed if all population values were
obtained by 𝝆 (rho).

We need to perform a hypothesis test of the "significance of the correlation coefficient" to decide
whether the linear relationship in the sample data is strong enough to use to model the
relationship in the population.

The hypothesis test lets us decide whether the value of the population correlation coefficient 𝜌 is
“close to zero” or “significantly different from zero” based on the sample correlation coefficient
𝑟 and the sample size 𝑛.

The null hypothesis is

𝐻0 : 𝜌 = 0

The alternative hypothesis could be any one of three forms: 𝐻𝑎 : 𝜌 ≠ 0, 𝐻𝑎 : 𝜌 > 0 or 𝐻𝑎 : 𝜌 < 0.

Null Hypothesis is that the population correlation coefficient IS NOT significantly different from
zero. That means there is a significant linear relationship (correlation) between the variables X
and Y. Alternate Hypothesis is that the population correlation coefficient IS significantly
different from zero. That means there is a significant linear relationship (positive or negative)
between the variables.

Test statistic: As various samples each of sample size 𝑛 are drawn, the values of 𝑟 vary from
sample to sample. The sampling distribution of 𝑟 is approximated by a 𝑡 distribution with 𝑛 − 2
degrees for freedom. The standard deviation of 𝑟 can be shown to be approximated by

1−𝑟 2
√ , 𝑟 is the sample or observed correlation coefficient.
𝑛−2

Then for the null hypothesis 𝐻0 : 𝜌 = 0, the standardized 𝑡 statistic can be written as

𝑟−𝜌 𝑛−2
𝑡= = 𝑟√
2 1 − 𝑟2
√1 − 𝑟
𝑛−2
Based on the value of the test statistic, 𝑡, we can calculate the 𝑝-value or equivalently we can
check whether this 𝑡 value is in the critical region or not. Note that he test statistic 𝑡 has the same
sign as the correlation coefficient 𝑟.

For example, for the salary data Salary_Data.csv, observed correlation coefficient between
‘Years of experience’ (X) and ‘Salary’ (Y) is 𝑟 = 0.978 and there are 𝑛 = 30 observations. We
want to test the significance of the correlation between these variables. According to our
observed correlation coefficient, we take the research (alternative) hypothesis as the two
variables are positively related (𝐻𝑎 : 𝜌 > 0), i.e. true correlation coefficient 𝜌 is significantly
greater that zero, against the null hypothesis as that there is no relationship between these two
variables(𝐻0 : 𝜌 = 0). .

The value of standardized 𝑡 statistic is obtained as

𝑛−2
𝑡 = 𝑟√ = 24.8
1 − 𝑟2

For a one tailed test with 𝑛 − 2 = 28 degrees of freedom, 𝑡 = 2.4671 for the 0.01 level of
significance. So the corresponding 𝑡 value is in the region of rejection of 𝐻0 (as 24.8>2.4671),
and hence the null hypothesis is rejected. Thus the alternative hypothesis that the two variables
are positively correlated is accepted.

• Spearman’s rank correlation coefficient (Spearman’s rho)

The Pearson’s correlation coefficient can be used for interval or ratio level scales. When a
variable is measured at the ordinal level, then we need to use a correlation coefficient designed
for an ordinal level scale. If a scale is ordinal, it is possible to rank the different values of the
variable.

In order to compute correlation coefficient for two such variables whose values have been
ranked, Spearman considered the numerical differences in the respective ranks. The correlation
coefficient so obtained is called rank correlation coefficient.

For instance, suppose 10 different universities are ranked in ‘Management’ and Engineering’
category, and we wish to know the correlation between the two rankings, then Spearman’s rank
correlation will be appropriate method for that. If actual data are given instead of rank, then we
must give them rank. We can assign ranks by ordering the values from low to high, or from high
to low.

Suppose that there are two variables X and Y. For each observed case, the rank for each of the
variables X and Y is determined. For each case i, the difference in the rank on variables X and on
variable Y is determined, and given the symbol 𝐷𝑖 . If there are 𝑛 cases, the Spearman rank
correlation between X and Y is defined as

6 ∑ 𝐷𝑖 2
𝑟𝑠 = 1 −
𝑛(𝑛2 − 1)

The true Spearman correlation coefficient is denoted by 𝜌𝑠 .

Let’s consider the following data and calculate Spearman’s rank correlation coefficient. Score
out of 100 in Management category (X) and Engineering category (Y) for 10 universities are
given.

Universities X Y Rank on X Rank on Y 𝐷𝑖 𝐷𝑖 2

U1 92 78 2 5 -3 9
U2 95 80 1 4 -3 9
U3 82 83 4 3 1 1
U4 67 72 9 7 2 4
U5 88 88 3 1 2 4
U6 65 70 10 8 2 4
U7 78 85 5 2 3 9
U8 70 75 8 6 2 4
U9 72 69 7 9 -2 4
U10 75 66 6 10 -4 16
We have
6 ∑ 𝐷𝑖 2 6 × 64
𝑟𝑠 = 1 − 2
=1− = 0.61
𝑛(𝑛 − 1) 10(100 − 1)

So the result indicates that there is moderate positive correlation.

Correlation BMLT
No ratings yet
Correlation BMLT
5 pages
Understanding Correlation in Statistics
No ratings yet
Understanding Correlation in Statistics
6 pages
Correlation Analysis
No ratings yet
Correlation Analysis
16 pages
Fds Unit III Notes
No ratings yet
Fds Unit III Notes
23 pages
Topic 2 - Correlation Theory
No ratings yet
Topic 2 - Correlation Theory
15 pages
Correlation
No ratings yet
Correlation
34 pages
STATISTICS Documentary
No ratings yet
STATISTICS Documentary
18 pages
Unit Iii Poriyan Notes
No ratings yet
Unit Iii Poriyan Notes
33 pages
Correlation
No ratings yet
Correlation
6 pages
Lecture 29
No ratings yet
Lecture 29
5 pages
Correlation
100% (1)
Correlation
78 pages
Correlation
No ratings yet
Correlation
8 pages
Stat I Chapter 6 Correlation & Regression
No ratings yet
Stat I Chapter 6 Correlation & Regression
39 pages
Understanding Correlation Analysis
No ratings yet
Understanding Correlation Analysis
48 pages
Chapter 8 - PSYC 284
No ratings yet
Chapter 8 - PSYC 284
7 pages
Correlation & Regression Guide
No ratings yet
Correlation & Regression Guide
29 pages
Microsoft PowerPoint Session 4 PDF
No ratings yet
Microsoft PowerPoint Session 4 PDF
86 pages
The Significance of Correlation
No ratings yet
The Significance of Correlation
6 pages
Correlation and Regression-1
No ratings yet
Correlation and Regression-1
32 pages
Correlation Analysis and Its Types
No ratings yet
Correlation Analysis and Its Types
50 pages
Lecture-25 CORRELATION - 34861774 - 2024 - 05 - 04 - 23 - 38
No ratings yet
Lecture-25 CORRELATION - 34861774 - 2024 - 05 - 04 - 23 - 38
4 pages
Correlation vs. Regression Explained
No ratings yet
Correlation vs. Regression Explained
27 pages
Correlation
No ratings yet
Correlation
4 pages
Chapter - Six
No ratings yet
Chapter - Six
8 pages
Correlation 1
100% (1)
Correlation 1
57 pages
Topic 4.5 Correlational Analysis
No ratings yet
Topic 4.5 Correlational Analysis
28 pages
MRS - Diana-Correlation Analysis-Notes
No ratings yet
MRS - Diana-Correlation Analysis-Notes
16 pages
Unit II Notes Correlation and Regression
No ratings yet
Unit II Notes Correlation and Regression
19 pages
Lecture Correlation Analysis
No ratings yet
Lecture Correlation Analysis
34 pages
Correlation
No ratings yet
Correlation
33 pages
4-1 Introduction To Corrrelation and Its Properties
0% (1)
4-1 Introduction To Corrrelation and Its Properties
14 pages
Correlation Analysis Guide
No ratings yet
Correlation Analysis Guide
83 pages
Correlation Analysis Notes-2
No ratings yet
Correlation Analysis Notes-2
5 pages
Cce 68 D 4 CC 4
No ratings yet
Cce 68 D 4 CC 4
28 pages
Understanding Correlation Analysis
100% (1)
Understanding Correlation Analysis
51 pages
Chapter 3 Simple Linear Regression and Correlation
No ratings yet
Chapter 3 Simple Linear Regression and Correlation
29 pages
Oe Statistics Notes
No ratings yet
Oe Statistics Notes
32 pages
Understanding Correlation Coefficient
No ratings yet
Understanding Correlation Coefficient
3 pages
Correlation
No ratings yet
Correlation
14 pages
Correlation Notes
No ratings yet
Correlation Notes
8 pages
Lecture 4 - Correlation and Regression
No ratings yet
Lecture 4 - Correlation and Regression
35 pages
Correction and Regression
No ratings yet
Correction and Regression
30 pages
4-1 Introduction To Corrrelation and Its Properties
No ratings yet
4-1 Introduction To Corrrelation and Its Properties
14 pages
Unit 3 Correlation & Regression
No ratings yet
Unit 3 Correlation & Regression
77 pages
Lesson 11 Pearsons R
No ratings yet
Lesson 11 Pearsons R
12 pages
Correlation
No ratings yet
Correlation
22 pages
Coolidge Chapter 6
No ratings yet
Coolidge Chapter 6
57 pages
Online Class Etiquettes and Precautions For The Students
No ratings yet
Online Class Etiquettes and Precautions For The Students
49 pages
Correlation and Regression
100% (5)
Correlation and Regression
49 pages
Correlation Anad Regression
No ratings yet
Correlation Anad Regression
13 pages
Lesson 11 - Regression and Correlation Analysis
No ratings yet
Lesson 11 - Regression and Correlation Analysis
8 pages
Understanding Correlation Analysis
No ratings yet
Understanding Correlation Analysis
54 pages
Understanding Correlation Types and Coefficients
100% (2)
Understanding Correlation Types and Coefficients
46 pages
Correlation Analysis
No ratings yet
Correlation Analysis
4 pages
Statistics Module 3hejeiehhwwhgsysysudhhdbb
No ratings yet
Statistics Module 3hejeiehhwwhgsysysudhhdbb
44 pages
Group Assignment
No ratings yet
Group Assignment
3 pages
Question Papers 3rd Sem
No ratings yet
Question Papers 3rd Sem
10 pages
Email Process
No ratings yet
Email Process
2 pages
Differentiation
No ratings yet
Differentiation
3 pages
Bahrain GP
No ratings yet
Bahrain GP
6 pages
D.A.V Public School, Berhampur STD-10TH/ SEC-F Class Teacher-Sangeeta Nayak
No ratings yet
D.A.V Public School, Berhampur STD-10TH/ SEC-F Class Teacher-Sangeeta Nayak
2 pages
D.A.V. School Virtual Class Timetable
No ratings yet
D.A.V. School Virtual Class Timetable
3 pages
New Doc 2020-08-09 11.45.51 PDF
No ratings yet
New Doc 2020-08-09 11.45.51 PDF
3 pages
FIITJEE Bhubaneswar Timetable 2019
No ratings yet
FIITJEE Bhubaneswar Timetable 2019
1 page
B B E T II: IG ANG DGE Est
No ratings yet
B B E T II: IG ANG DGE Est
1 page
Transcript Example
No ratings yet
Transcript Example
5 pages
Corporate Governance and Firm Performance in Russia: An Empirical Study
No ratings yet
Corporate Governance and Firm Performance in Russia: An Empirical Study
12 pages
Bangladesh's 4IR Job Shift Impact
No ratings yet
Bangladesh's 4IR Job Shift Impact
77 pages
The National Academies Press: Coal: Research and Development To Support National Energy Policy (2007)
No ratings yet
The National Academies Press: Coal: Research and Development To Support National Energy Policy (2007)
183 pages
Managerial Accounting in Manufacturing
No ratings yet
Managerial Accounting in Manufacturing
3 pages
Reserve Estimation Methods 03 Decline Curve
No ratings yet
Reserve Estimation Methods 03 Decline Curve
6 pages
Fba 324 Research Seminar
No ratings yet
Fba 324 Research Seminar
7 pages
One-Way ANOVA in SPSS Statistics
No ratings yet
One-Way ANOVA in SPSS Statistics
7 pages
CHE 435 Project #1: Air Emission Control
No ratings yet
CHE 435 Project #1: Air Emission Control
14 pages
10.E Hypothesis Testing With Two Samples (Exercises)
No ratings yet
10.E Hypothesis Testing With Two Samples (Exercises)
29 pages
Effectiveness of School-Based CPR Training Among Adolescents To Enhance Knowledge and Skills in CPR: A Systematic Review
No ratings yet
Effectiveness of School-Based CPR Training Among Adolescents To Enhance Knowledge and Skills in CPR: A Systematic Review
9 pages
Machine Learning Approaches For Student Performance Prediction
No ratings yet
Machine Learning Approaches For Student Performance Prediction
6 pages
Village Restudies: Trials & Insights
No ratings yet
Village Restudies: Trials & Insights
10 pages
1st Quiz AUD THEORY
No ratings yet
1st Quiz AUD THEORY
1 page
Factors Influencing Family Planning in Ethiopia
No ratings yet
Factors Influencing Family Planning in Ethiopia
14 pages
Fatigue Design
No ratings yet
Fatigue Design
13 pages
Grammarly's Impact On Improving The Academic Writing Skills - Juan Sumulong Campus
100% (1)
Grammarly's Impact On Improving The Academic Writing Skills - Juan Sumulong Campus
38 pages
102 Course Outline
No ratings yet
102 Course Outline
2 pages
Law Thesis
100% (2)
Law Thesis
6 pages
Black Book Project
No ratings yet
Black Book Project
68 pages
GRED HD: Advanced GPR Software
No ratings yet
GRED HD: Advanced GPR Software
2 pages
12 Mandic Mrnjavac Kordic
No ratings yet
12 Mandic Mrnjavac Kordic
22 pages
AI Investment Announcements and Stock Market Performance A Comparative Study Across Sectors in The United States
No ratings yet
AI Investment Announcements and Stock Market Performance A Comparative Study Across Sectors in The United States
43 pages
Classroom Management Strategies Study
No ratings yet
Classroom Management Strategies Study
48 pages
Consumer Preference for MamaEarth
No ratings yet
Consumer Preference for MamaEarth
21 pages
Evaluating Flood Damages - Guidance and Recommendations On Principles and Methods
No ratings yet
Evaluating Flood Damages - Guidance and Recommendations On Principles and Methods
189 pages
GE8077 Total Quality Management-By WWW - LearnEngineering.in
No ratings yet
GE8077 Total Quality Management-By WWW - LearnEngineering.in
124 pages
Literature Review On Performance Management System
No ratings yet
Literature Review On Performance Management System
9 pages
Thesis Guidelines for BS Architecture
No ratings yet
Thesis Guidelines for BS Architecture
8 pages
IMRAD Research
No ratings yet
IMRAD Research
6 pages

Correlation

Uploaded by

Correlation

Uploaded by

Correlation

(Association between Variables)

• Simple and Multiple Correlation

❖ Methods of studying Correlation

The Pearson product-moment correlation coefficient, known as the Pearson’s 𝒓, is widely

∑𝑛𝑖=1(𝑥𝑖 − 𝑥̅ )(𝑦𝑖 − 𝑦̅)

𝐶𝑜𝑣(𝑋, 𝑌) 𝐸[(𝑋 − 𝑚𝑋 )(𝑌 − 𝑚𝑌 )] 𝐸(𝑋𝑌) − 𝐸(𝑋)𝐸(𝑌)

The null hypothesis is

The value of standardized 𝑡 statistic is obtained as

• Spearman’s rank correlation coefficient (Spearman’s rho)

The true Spearman correlation coefficient is denoted by 𝜌𝑠 .

Universities X Y Rank on X Rank on Y 𝐷𝑖 𝐷𝑖 2

So the result indicates that there is moderate positive correlation.

You might also like