Multiple Linear Regression: BIOST 515 January 15, 2004
Multiple Linear Regression: BIOST 515 January 15, 2004
Outline
• Hypothesis testing
2
• Scientific question
• Gain precision
4
Scientific Question
• Predictors of interest
? The scientific factor under investigation can be modeled by multiple
predictors (eg - dummy variables, polynomials, etc.)
• Effect modifiers
? The scientific question may relate to detection of effect modification
• Confounders
? The scientific question may have been stated in terms of adjusting for
known (or suspected) confounders
5
Confounding
Precision
y = Xβ +
y1 1 x11 x12 · · · x1p
y2 1 x21 x22 · · · x2p
y=
.. ,
X=
.. .. .. ,
..
yn 1 xn1 xn2 · · · xnp
β0 1
β1 2
β=
.. ,
=
..
βp n
9
X 0X β̂ = X 0y.
β̂ = (X 0X)−1X 0y
10
e = y − X β̂ = y − Hy = (I − H)y.
11
β̂ is an unbiased estimator of β
Estimation of σ 2
SSE
σˆ2 = M SE =
n−p−1
As in simple linear regression σˆ2 is an unbiased estimator of
σ 2.
13
Example
Scientific question
Scatterplot matrix
pairs(cbind(chs$DIABP,chs$HEIGHT,chs$WEIGHT),
labels=c("Blood Pressure", "Height", "Weight"))
● ●
● ●
100
● ●
● ● ●●● ●● ● ● ●● ● ●
● ●● ● ●●● ●
●
● ●● ● ● ● ● ● ● ●● ● ● ●
● ●●● ●● ● ●● ● ● ● ● ●●● ● ● ●● ●
● ●● ●●
●●● ● ●● ●●●●●●●● ●●
● ●● ●
● ●●
● ●
●
● ●●●●● ● ●
● ●● ●● ● ●
●
●●●
●
●●●
●● ● ●●● ● ● ●●
●
●●●● ●● ● ● ● ●
● ● ● ●● ●
● ●●● ● ● ● ●●
● ●●
● ●
●●●●● ●● ● ●● ● ●
● ● ●● ● ●●
● ●●●● ●●● ● ● ●●●
80
● ● ●
●●● ●●● ● ●●
●● ●●●●●●●
● ●
●● ● ● ●●● ●●● ● ●
●●●●●
● ●● ●●●●●●●
●● ●●●●● ●● ● ●●●
● ● ● ●●●
Blood Pressure ●●●●
●
●
●●● ●
● ●●
●
●
●●
●●
●
●
●●●
● ●
●●
●●●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●●●
●●●
●●●
●
●
● ●
●
●●●
●●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●●
●
● ●
●
●
●
●●
●
●
●
●●●
●
●
●
●●●● ● ●
●●●
●●
●
●● ●
● ●● ●●
●●
●●
●●
●●●
● ●●
●●●●
●●●
●
●
●
●●●●
●
●
●●
●●●
●
●
●●
●●●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●●●
●
●
●
●
●●●
●●
●
●
●●
●
●
●
●
●●●●●●
●
●●
●●●●
●● ●●
●
●
●
●
●●
●●
●
●●
●
● ●●
● ●
●●● ●
● ●● ●● ●● ●● ●●
●●● ● ●●●● ● ●
●●●●
●
● ●●● ●●●●●
●●●●●●●●
●
●
●●
●●●●●●●
●●
●●● ●●
● ● ●
● ●●● ● ● ● ●●
● ●●●●●
●●●●●
●●●
●●●
●● ●
●●●
● ● ●●● ●
●
● ●● ●●●
●●●
●
●●
●
●
●●
●
● ●●●
●
●
●
●●●●
● ●● ●
●
●● ●●●
● ●● ● ●●●
● ●
●●
●●
● ●●
●
●●●
●
●
●
●
●●
●
●●
● ● ●● ●
● ●
●● ●●●●● ●●●●●● ●● ●
● ●
●
● ●●●●●●●●
● ● ●
●●●● ●●●
● ●●●●● ●
●●●
●●●
60
● ● ● ● ●●
●● ●●●●● ● ● ●●
●● ●
● ●●●●
●● ● ●●●
●
● ● ● ●● ●●●●● ●● ● ● ●●
● ●●●●
● ● ●●● ●
●●●● ●●● ● ● ● ●●
●● ● ● ●
●● ● ● ●●
● ● ●● ●●● ● ● ●● ● ● ●●
● ●●● ● ●●
● ●● ●●
● ●
●●●
●
●●
●
● ●
● ●●●● ● ● ● ● ●● ●● ●● ●
● ● ● ● ●
● ●
● ●
● ●
40
●● ● ●
● ●
● ●
● ●
● ● ● ●
● ● ● ● ●●
● ● ●● ● ●● ●
● ●
●● ● ● ● ● ●● ● ●
180
● ●●● ● ● ● ● ● ●●
●●● ●●●●● ● ●● ●● ●● ● ● ●●
● ●●● ●●●
● ●● ●
● ●● ● ●● ● ●● ●
● ●●●
●●● ● ●● ●●● ●
●
●● ●● ●● ● ●
● ● ●●
●● ●●●
●●●
●
●
●●
●●● ● ●
●
● ●● ●
●
●●●●● ● ●● ●●●
● ●●
● ●
●
●
●●●●
●
●
●
●●
●●●●
●●
●●
●
●●●● ●
●●
●● ● ● ●
●●● ●● ● ● ● ●
● ● ● ●● ●
●●● ● ● ●
●●● ●
●●●
●● ●●
●● ●●● ●● ●●● ●
●
●●
●●●
●
●
●●
●
●
●●●●
●● ● ●● ● ●● ●
● ● ● ● ● ●
● ●
●●●
●
●
●
●
●●
● ●●●
●●
●
●
●●●
●
●●●●●
●●●
●
●
●
●
●
●
●
●
●●●
●
●●
●●
●
●
●
●
●
●
●
●●
●●
●●
●●
●●●
●●
●●
●
●
●
●
●
●
●
● ●
●
●
●●●
●●
●●
●●
●
●
Height ●
● ●
● ●●●
● ●●●
●
●
●●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
● ●
●●●
●
●●
● ●●●●●● ●●● ●
●
●
●
●
●●
●
●●
● ●●
●
●●● ● ●
●● ●●●● ● ●●
●●●● ●
●
● ● ●●●● ●●● ●● ● ● ● ●●●
●
● ●●●
●● ●● ●●●●● ● ●
●●
●●●●
●
●●●
●●● ●
● ●
●●
●
●●
●
●●
●● ●●●●● ●
●●● ●● ● ● ●● ●●
●●●●
●●●
●
●●
●●
●●
●
●●● ● ●
● ●● ●●
●●●●●●● ●
●●●
●●
●
●
●
●● ●
●●●●● ●
● ●●
●● ●
● ●
● ●
●●
● ●
●●●●●
● ● ●● ●
●● ● ● ●
● ●●● ● ●● ●● ● ●
●
●●
●●●●
●●●●●●●
● ●● ●
●
● ●●● ●●
●● ●●● ●
●
●●●
●
●
●
●●●
●
●●
●
● ●●● ●● ●
● ●
● ●●● ●
●●●
●
●
●
● ● ● ●
●●●
●●●
●●
●
●
●●
●
●●●● ● ● ● ●
● ●● ●●● ●●
●● ●●● ●● ●● ● ●●● ●● ●
● ●●● ●● ● ● ●● ● ●●●●● ●● ●● ● ●
● ● ●● ● ● ●● ● ● ●●●
●
● ● ● ● ● ●●● ● ●● ● ●● ●●● ●
● ●● ● ● ●
●● ● ● ● ●
140
● ●
● ●
300
● ●
● ●
250
● ● ● ●●
●
● ● ● ● ● ●● ●
●● ● ●●● ● ●● ●● ●
● ● ●● ● ●●●● ●● ● ●●
●
● ●● ● ●● ● ● ●
●
●●●●●● ●● ● ● ● ●●● ● ●●●●● ●●
200
●● ● ● ● ●● ● ● ● ●
●●● ●
● ●
● ●
● ●
●●
●●
●
●
●
●
●
●●●
●●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●●●●
●
●
●●●
●●●●
●●
●
●
●●●
●●
●
●●
●
●●● ●
●
●●●
●
●●
●
●
● ● ●●
●●●●
●●●● ●
●
●
●●●●
●
● ● ●
●●●●●●●
●●
●
●
●
●
●
●●
●
●●
●
●●●
●
●●●●
●●●●
●
●
●
●●
●●●
●●
●●
●
●
●
●
●●● ● ●
●●
●
●●
●
●
● ●
●● ●●
Weight
●● ●●●
● ● ●
●●
●●●●● ●●●●● ● ● ●
●●
●
●●● ●●●● ● ●●● ●
●●
●●●●● ●
●●● ●
● ●●
●●●●● ● ● ● ●●●●●●
● ●
●●
●●● ●● ●● ●●●●
●●●
● ●
● ●
●
● ● ●● ●
●● ●●●●●
● ●●
●●● ●●●●
● ●
●●
● ●● ●●
● ● ● ● ●● ● ●●
●● ●●
● ●
●●
● ●●
●●
●●
●
●●
150
● ● ●●●●●
●●●
●●
●●● ● ● ●● ●●●●● ●●
●●●● ●● ● ● ●
●
●
● ●●
●
●
●
●●●
● ●
●
●
● ● ●●
●●●
●
●●●
●●
●
●●●●
●●
●● ●●
●
●●
● ●●● ●
● ●● ● ●● ●● ●
● ●●●
●●
●●●
● ●●
●
●●
●●●
●●●
●
●
●
●●●●
●● ●● ●● ●
●●
●● ●●
● ●● ●
●
●●●●
● ●● ●●● ●
●
●●● ●●● ●● ●● ● ● ● ●●●
●●●●●●
●●●
● ●
●●●
●● ●
●● ●
●●●
●● ●
●●● ●● ●
●● ● ● ●● ●●●● ●
●
●
●●●●●●
●● ● ● ●● ●
●
●
● ●
●●
●
●●●● ●
●●
●●●●
●
● ●●
● ● ●
●● ●● ●●●
●●
●●●
●
●●
●●
●
●●
●
●
●●● ● ●●
●●
●●●●
● ●
●● ●
●●
●● ●
●
●
●●
● ● ●●
●
●
●
●●● ●●
●● ●●●
●●● ● ●
●
●●●●
●●●● ● ● ● ●
● ●●
●●●●
●● ●●
●
●●●●
●
●
●● ●
● ●● ●● ● ● ●● ● ●● ●●●
●●● ● ●●● ●●
● ● ●● ●● ● ● ●
●● ● ●
100
●● ● ● ●●●● ●●
● ●●
●
●●●
● ●● ● ●●● ●●●●
● ●●●
●●
●●●
● ●●● ● ● ●● ●●
● ● ● ●
3d scatterplot
library(scatterplot)
scatterplot3d(chs$HEIGHT,chs$WEIGHT,chs$DIABP, xlab="height", ylab="weight",
zlab="Blood pressure")
● ●
● ● ●●
80 100 120
●
● ● ●● ●●●● ●
● ● ● ●
● ● ●
Blood pressure
●● ●● ●● ●● ● ●● ●●
●● ● ● ●●
●●● ●● ●● ●●●●●●
●
● ●●
●
●
● ●
●
weight
● ● ●●● ● ●●
● ●
● ● ●●●●●●
●●
● ●
●● ●
●
●
●● ●
●
●
●●
●●
●●●
●●
●●
● ● ●●
●●
● ●●
●
●
●●
●
●
●
●
●
●●●
●
●●
●●
●
●
●
●●
●●
●●●●
●
●
●
●
●●●●●● ●
●●●
● ●
● ●●
●●
● ●
●●●●●●●●●
●●●●●●
●●●
● ● ●
●
●●
● ●●● ●
● 350
●
●●● ●
● ●● ● ●● ●●● ●●
●●● ●● ● ●●
●● ●●● ●●●●
● ● ● ●
●●●● ●●
●●●●●●●
●
● ●● ● ●
● ●
●●
●● ●● ●●●●●●●●
●● ●●●●●●
●●
●●●●
● ●●●
● ●
●●
●
●
●
●
● ●●
●●●
●●
●
●●
●●
●●
●●●
●
●
●
●●●
●●●●●
● ● ●●
● 300
● ●● ● ●● ●
●
●
●
●●
●●●●●●●●●●●●
●
●
●
●
● ●●
●●●
●●
●● ●●
●●
250
● ● ●●●●● ●●
●
●●
●●●
●
●●
●●
● ● ●
●
●●●
●
● ● ●
●●
● ● ●●● ●● ● ● ● ● 200
60
●● ● 150
●
●● 100
40
50
130 140 150 160 170 180 190 200
height
17
Regression models we may be interested in:
E[BPi] = β0 + β1heighti
E[BPi] = β0 + β1weighti
E[BPi] = β0 + β1heighti + β2weighti
We’ve fit models similar to the first 2, but not the 3rd.
BPi = β0 + β1heighti + i
>lmht <- lm(DIABP~HEIGHT,data=chs)
>summary(lmht)
Residuals:
Min 1Q Median 3Q Max
-32.8206 -7.6509 -0.0482 7.4237 40.0141
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 51.56051 8.79011 5.866 8.18e-09 ***
HEIGHT 0.12353 0.05343 2.312 0.0212 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
BPi = β0 + β1weighti + i
lmwt <- lm(DIABP~WEIGHT,data=chs)
summary(lmwt)
Residuals:
Min 1Q Median 3Q Max
-30.7561 -7.4422 -0.1446 7.3281 40.1285
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 63.99253 2.50881 25.507 < 2e-16 ***
WEIGHT 0.04905 0.01534 3.198 0.00147 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residuals:
Min 1Q Median 3Q Max
-31.3833 -7.2260 -0.2881 7.7002 39.6144
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 55.65777 8.91267 6.245 9.14e-10 ***
HEIGHT 0.05820 0.05972 0.975 0.3302
WEIGHT 0.04140 0.01723 2.403 0.0166 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
• Race
• Sex (Male/Female)
yi = β0 + β1sexi + i
and
E(yi) = β0 + β1, if female.
What does β1 represent?
23
How is being a current smoker related to blood pressure?
BPi = β0 + β1smokeri + i
>lmsmk <- lm(chs$DIABP~smoker)
>summary(lmsmk)
Residuals:
Min 1Q Median 3Q Max
-32.1307 -8.0207 -0.1307 7.8693 41.3093
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 72.1307 0.5339 135.103 <2e-16 ***
smoker -2.8344 1.7020 -1.665 0.0965 .
Call:
lm(formula = chs$DIABP ~ smoke * chs$WEIGHT)
Residuals:
Min 1Q Median 3Q Max
-3.092e+01 -7.195e+00 4.102e-04 7.586e+00 3.986e+01
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 63.87963 2.67958 23.839 < 2e-16 ***
smoker 4.53954 7.94883 0.571 0.56819
chs$WEIGHT 0.05108 0.01626 3.141 0.00178 **
smoker:chs$WEIGHT -0.04517 0.05187 -0.871 0.38431
●
●
100
● ●
● ● ●
● ● ●
● ●
●
● ● ●
● ● ● ● ●
●
● ● ● ● ● ●
● ● ● ●● ●● ●
● ● ● ● ●
● ● ●
blood pressure
● ● ●● ● ● ●● ●
● ● ● ● ●● ●●● ● ●●● ● ●● ●●
● ● ● ● ●
● ● ● ●●
● ●● ● ●● ●● ● ● ● ● ● ● ●
●● ● ● ● ●● ● ● ●
80
● ● ● ●● ● ● ● ● ●
●● ● ●●●
● ● ● ● ●●●● ● ● ● ● ●●
● ●● ●
● ●●●● ● ● ● ●● ● ● ●
● ● ● ●● ● ●● ● ●
● ● ●● ● ● ● ●●● ● ● ● ●●
●● ●●
●●● ● ● ● ● ●
●● ● ● ●● ● ● ● ● ●
● ●●
● ●●● ●
●● ●●●●●● ● ●● ●
● ●●● ● ●● ● ● ●● ●●
●
● ● ●● ●●
●●● ● ● ● ● ● ● ●● ●
● ● ●● ●
● ●● ●●
●●●●● ● ● ● ●● ●
●● ● ●● ● ● ● ● ● ● ● ●● ●● ● ●
● ● ●● ● ● ● ●● ● ●● ● ● ●
● ●● ● ● ● ●● ●● ●● ●●● ●●● ● ● ●
● ●● ● ●●●●●
● ● ● ●●● ● ●
● ● ●●● ●●● ● ● ●
● ● ●● ● ● ● ● ● ●
●● ● ●● ● ● ● ● ● ●● ● ● ●●
● ●● ● ● ● ● ●●● ● ●● ● ●● ● ●
●● ● ● ● ● ● ●
60
● ● ● ● ● ● ●●● ● ●
● ●●● ●● ● ● ● ● ● ●
● ● ● ● ● ● ●
● ● ●
● ● ●● ● ● Smokers
● ● ● ● ● ● ● ● ●
● ● ●
●
● ●●● ● ●
●
●●
● ●●● ● Non−smokers
● ●
● ●
●
●
●
●
40
●● ●
weight
27
Option 1:
1, never smoked
smokei = 2, f ormer smoker
3, current smoker
BPi = β0 + β1smokei + ,
how do we interpret β1?
βˆ1 = −1.08 and se(
ˆ βˆ1) = 0.7637.
28
Option 2:
Create two dummy variables
1, never smoked
smoke1i =
0, otherwise
and
1, f ormer smoker
smoke2i = .
0, otherwise
BPi = β0 + β1smoke1i + β2smoke2i + i
How do we interpret β0, β1 and β2?
29
> lmsmk2 <- lm(chs$DIABP~smoke1+smoke2)
> summary(lmsmk2)
Call:
lm(formula = chs$DIABP ~ smoke1 + smoke2)
Residuals:
Min 1Q Median 3Q Max
-32.2824 -7.9008 -0.2824 7.7176 41.5198
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 69.296 1.618 42.839 <2e-16 ***
smoke1TRUE 2.986 1.763 1.694 0.091 .
smoke2TRUE 2.624 1.816 1.445 0.149
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Summary