CORRELATION & REGRESSION
Dr.S.Porchelvan
Prof. of Biostatistics
06/04/25 Dept. of Biostatistics 1
Saveetha Medical College
CORRELATION
• A measure of association between two variables observed from the
subjects.
• Scatter diagram
• Pearson’s Coefficient of Correlation – Strength of association
• The value of Correlation Coefficient can vary between ( 1.0 ) to ( +
1.0 )
[1/ N * X Y] X Y
r = -------------------------------------
Sx Sy
r - value Degree of Association
0.0 No association
0.01 to 0.2 Negligible
0.2 to 0.4 Weak
0.4 to 0.7 Moderate
0.7 to 1.0 Strong
1.0 Perfect
06/04/25 Dept. of Biostatistics 2
Saveetha Medical College
Example :
The amount of blood loss (ml) and mean systolic Blood pressure
(mmHg ) during surgery were recorded for 15 patients. Prepare a
scatter diagram and compute “r”.
Mean SBP ( mmHg )
93,88,123,103,108,103,88,88,78,108,88,103,88,138,108
Blood Loss (ml)
112,98,150,115,129,148,96,93,85,116,96,112,93,156,112
In order to know how the values are dispersed, we shall first
plot the Scatter or Correlation diagram with mean SBP in the
X axis and blood loss in the Y – axis.
06/04/25 Dept. of Biostatistics 3
Saveetha Medical College
Scatter Plot for blood loss (ml) during
surgery and mean SBP (mmHg)
160
Blood loss (ml)
120
80
40
50 70 90 110 130 150 170
Mean SBP (mmHg)
[(1/N) Σ X Y] – X Y
r = -----------------------------
SX SY
06/04/25 Dept. of Biostatistics 4
Saveetha Medical College
No X*Y (X – X )2 (Y – Y)2
Mean SBP ( X ) Blood Loss (Y)
1 93 112 10416 53.73 2234.45
2 88 98 8624 152.03 3218.29
3 123 150 18450 513.93 15692.57
4 103 115 11845 7.13 8148.67
5 108 129 13932 58.83 3105.83
6 103 148 15244 7.13 5887.49
7 88 96 8448 152.03 333.79
8 88 93 8184 152.03 11391.29
9 78 85 6630 498.63 1349.09
10 108 116 12528 58.83 3754.01
11 88 96 8448 152.03 472.19
12 103 112 11536 7.13 5887.49
13 88 93 8184 152.03 2675.99
14 138 156 21528 1419.03 8330.21
15 108 112 12096 58.83 2427.53
Total 1505 1711 176093 3443.35 74908.89
06/04/25 Dept. of Biostatistics 5
Saveetha Medical College
Xi 1505
X = ----- = -------- = 100.33
N 15
Yi 1711
Y = ------- = -------- = 114.07
N 15
SX = (Xi – X )2 /n-1 = 15.68
SY = (Yi –Y )2 /n-1 = 22.49
06/04/25 Dept. of Biostatistics 6
Saveetha Medical College
( 1/15)(176093) - (100.33)(114.07)
r = ----------------------------------------------
(15.68) (22.49)
r = + 0.84
The value of r shows a Strong positive association between mean SBP
during Surgery and blood loss with a magnitude of 0.84
Calculation of 95% CI
The 95% CI for r is r t SE ( r )
0.84 t0.05 (1 – r2) /(n-2)
0.84 2.16 (0.0226)
0.7912 to 0.8888
The correlation between systolic blood pressure during
surgery and blood loss (0.84) is statistically significant
( has not occurred by chance / Sampling error).
The 95% CI is also narrow.
06/04/25 Dept. of Biostatistics 7
Saveetha Medical College
( I I ) Student’s t - test
r 0.84
t = ----------------- = -------------------------- = 37.09
(1 – r2) / n-2 (1 –0.84 2) /(15-2)
From the table value it is found that t0.01 (13) = 4.221
The calculated value 37.09 is larger than the table
value. The positive association between mean
systolic blood pressure during surgery
and blood loss is statistically significant ( has not
occurred by chance / Sampling error )
06/04/25 Dept. of Biostatistics 8
Saveetha Medical College
06/04/25 Dept. of Biostatistics 9
Saveetha Medical College
06/04/25 Dept. of Biostatistics 10
Saveetha Medical College
06/04/25 Dept. of Biostatistics 11
Saveetha Medical College
06/04/25 Dept. of Biostatistics 12
Saveetha Medical College
06/04/25 Dept. of Biostatistics 13
Saveetha Medical College
06/04/25 Dept. of Biostatistics 14
Saveetha Medical College
REGRESSION
A variable depends on one or more variables.
ex: diastolic blood pressure depends on age
Regression Coefficients are used to measure
association
It measures the mean changes to be expected in
the dependent variable (Y) for a unit change in
the value of the independent variable (X)
06/04/25 Dept. of Biostatistics 15
Saveetha Medical College
The regression line of Y on X is given by
SY
Y- Y = r ----- ( X – X )
SX
where this equation is regressed from the X axis and passes
through X and Y. We arrive at an equation of a straight line of the
form
Y = + X , is constant
And
Σ (Xi – X) (Yi – Y)
= ---------------------- is the regression coefficient
Σ (Yi – Y)2
06/04/25 Dept. of Biostatistics 16
Saveetha Medical College
Example :
The gestational age (weeks) and the abdominal circumferences (cm)
were recorded for 54 antenatal mothers . Prepare a scatter diagram ,
compute 'r' , fit a regression line and test for the regression coefficient.
Abdominal
Circumferenc
S. e
No (cm)
Y
No GA WKS X AC CMS Y GAWKS AC CMS GA WKS AC CMS GA WKS ACCMS
1 12.22 56.00 18.28 72.00 25.89 96.00 32.78 112.00
2 12.28 59.00 19.21 80.00 26.78 99.00 33.44 111.00
3 12.42 54.00 19.56 81.00 26.45 93.00 33.58 123.00
4 13.52 60.00 20.29 80.00 27.58 95.00 34.85 104.00
5 13.45 58.00 20.45 81.00 27.55 99.00 35.71 99.00
6 14.71 62.00 21.36 82.00 28.66 92.00 35.57 122.00
7 14.58 62.00 21.56 81.00 28.65 99.00 36.42 120.00
8 15.85 64.00 22.34 84.00 29.32 86.00 36.14 128.00
9 15.36 65.00 22.85 86.00 29.58 89.00 37.56 131.00
10 16.27 65.00 23.65 88.00 30.78 102.00 37.98 135.00
11 16.65 66.00 23.29 88.00 30.88 98.00 38.29 138.00
12 17.16 68.00 24.54 88.00 31.45 106.00 38.44 142.00
13 17.24 70.00 24.36 89.00 31.54 110.00
14 18.81 67.00 25.89 86.00 32.85 107.00
Gestational age in week = Number of days of gestation period / 7
06/04/25 Dept. of Biostatistics 17
Saveetha Medical College
For example ,
X = 25.12 wks S x = 8.01 X X ) 2
= 3464.6
Y = 90.33 cm S y = 22.81 Y Y) 2
= 28095.98
Fig . Abdominal Circumference (cm) and gestational age (weeks)
160
140
Abdominal circumference in cms
120
100
80
60
40 r= 0.961
20
0 10 20 30 40
Gestational age in weeks
06/04/25 Dept. of Biostatistics 18
Saveetha Medical College
We shall frame a Regression line of Y on X as follow
Sy
Y Y = r ------ ( X X )
Sx
22.81
Y 90.33 = ( 0.961) ------------ ( X 25.12 )
8.01
Y 90.33 = 2.7366 ( X 25.12 )
Y 90.33 = 2.7366 X 68.7433
Y = 2.7366 X 68.7433 + 90.33
Y = 2.74 X + 21.5858
Y = ( 21.59 ) + 2.74 X
This is of the form of a straight line Y = X
Where = ( 21.59 ) and = 2.74
And SE ( ) = ( 1/n – 2 ) { Y Y) 2 / X X ) 2
2
SE ( ) = ( 1/54 – 2 ) {28095.98 / 3464.64 (2.74) 2
SE ( ) = (1/52){(8.1093) - (7.5076)}
SE ( ) = (1/52){0.6017}
SE ( ) = 0.0115 = 0.1075
06/04/25 Dept. of Biostatistics 19
Saveetha Medical College
Fig . Abdominal Circumference (cm) and gestational age
(weeks) with regression line
140.00
120.00
100.00
80.00
60.00
15.00 20.00 25.00 30.00 35.00
gestational age in weeks
06/04/25 Dept. of Biostatistics 20
Saveetha Medical College
Calculation of 95% CI
The 95% CI for is t SE ( )
2.74 t 0.05 (0.1075)
2.74 2.00 (0.1075)
2.74 0. 2151
( 2.74 0.2151 ) to ( 2.74 + 0.2151)
i.e. 95% CI for is 2.5249 to 2.9551
2.74
t = ----------- = ----------- = 25.48
SE ( ) 0.1075
From the table, t 0.05 (52) = 2.00 , t 0.01 (52) = 2.65
is significant at p < 0.00001
Abdominal Circumference = ( 21.59) + (2.74) *Gestational Age
For GA = 14 wks, AC = ( 21.59) + (2.74) (14) = 59.95 cm
For GA = 15 wks, AC = ( 21.59) + (2.74) (15) = 62.69 cm
For GA = 16 wks, AC = ( 21.59) + (2.74) (16) = 65.43 cm
Inference
The 95% Confidence Interval does not include Zero (0). So it is found
to be statistically significant (has not occurred by chance/ Sampling error)
i.e. for every increase in week of gestational age the abdominal
circumference increases by 2.74 cm
06/04/25 Dept. of Biostatistics 21
Saveetha Medical College
Models for Regression
• Multiple Regression
• Logistic Regression
What lifestyle characteristics are risk
factors for coronary heart disease (CHD)?
Given a sample of patients measured on
smoking status, diet, exercise, alcohol
use, and CHD status, you could build a
model using the four lifestyle variables
06/04/25 Dept. of Biostatistics 22
Saveetha Medical College
Conti…
to predict the presence or absence of CHD
in a sample of patients. The model can
then be used to derive estimates of the odds
ratios for each factor to tell you, for example,
how much more likely smokers are to
develop CHD than nonsmokers
06/04/25 Dept. of Biostatistics 23
Saveetha Medical College
Dr.S.Porchelvan
[email protected]06/04/25 Dept. of Biostatistics 24
Saveetha Medical College