0% found this document useful (0 votes)
39 views32 pages

Multiple Linear Regression: BIOST 515 January 15, 2004

The document summarizes multiple linear regression. It outlines the motivation for using multiple linear regression over simple linear regression, including to account for scientific questions, confounding variables, and increased precision. It then presents the multiple linear regression model in matrix notation and describes estimating model parameters using least squares and maximum likelihood. Hypothesis testing in multiple linear regression is also mentioned. An example using data from the Cardiovascular Health Study is provided to illustrate the use of multiple linear regression to examine relationships between blood pressure, height, and weight.

Uploaded by

HazemIbrahim
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views32 pages

Multiple Linear Regression: BIOST 515 January 15, 2004

The document summarizes multiple linear regression. It outlines the motivation for using multiple linear regression over simple linear regression, including to account for scientific questions, confounding variables, and increased precision. It then presents the multiple linear regression model in matrix notation and describes estimating model parameters using least squares and maximum likelihood. Hypothesis testing in multiple linear regression is also mentioned. An example using data from the Cardiovascular Health Study is provided to illustrate the use of multiple linear regression to examine relationships between blood pressure, height, and weight.

Uploaded by

HazemIbrahim
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

Lecture 4

Multiple linear regression


BIOST 515

January 15, 2004


1

Outline

• Motivation for the multiple regression model

• Multiple regression in matrix notation

• Least squares estimation of model parameters

• Maximum likelihood estimation of model parameters

• Hypothesis testing
2

Multiple linear regression

We are now considering the model

yi = β0 + β1xi1 + · · · + βpxip + i, (1)

i = 1, . . . , n, E[i] = 0, var(i) = σ 2 and cov(i, j ) = 0.


We will also require that p < n. Additional assumption:

• The predictors (x1, . . . , xp) are fixed and measured without


error
3

Why multiple linear regression?

Previously we’ve examined the case with one predictor and


one outcome (simple linear regression). There are a variety of
reasons we may want to include additional predictors in the
model.

• Scientific question

• Adjustment for confounding

• Gain precision
4

Scientific Question

May dictate inclusion of particular predictors

• Predictors of interest
? The scientific factor under investigation can be modeled by multiple
predictors (eg - dummy variables, polynomials, etc.)
• Effect modifiers
? The scientific question may relate to detection of effect modification
• Confounders
? The scientific question may have been stated in terms of adjusting for
known (or suspected) confounders
5

Confounding

Sometimes the scientific question of greatest interest is


confounded by associations in the data.

From KKMN, pg. 187:


In general, confounding exists if meaningfully different
interpretations of the relationship of interest result when an
extraneous variable is ignored or included in the analysis.
6

Precision

Adjusting for an additional covariate changes the standard


error of the slope estimate

• Standard error is decreased by having smaller within group


variance

• Standard error is increased by having correlations between


the predictor of interest and other covariates in the model
7

General comments on multiple regression

• Can be difficult to choose the “best” model, since many


reasonable candidates may exist

• More difficult to visualize the fitted model

• More difficult to interpret the fitted model


8

The model in matrix notation

y = Xβ + 

   
y1 1 x11 x12 · · · x1p
 y2   1 x21 x22 · · · x2p 
y=
 ..  ,
 X=
 .. .. .. ,
.. 
yn 1 xn1 xn2 · · · xnp
   
β0 1
 β1   2 
β=
 ..  ,
 =
 .. 

βp n
9

Least Squares Estimates

S(β) = (y − Xβ)0(y − Xβ)


Least squares estimates are obtained by solving
∂S(β)
= −2X 0y + 2X 0Xβ = 0
∂β
for β. This gives us the least squares normal equations:

X 0X β̂ = X 0y.

To get the least squares estimator of β multiply each side by


(X 0X)−1 which gives

β̂ = (X 0X)−1X 0y
10

provided (X 0X)−1 exists (which it will if the regressors are


linearly independent).
The vector of fitted y-values , ŷ, corresponding to the
observed y-values y is

ŷ = X β̂ = X(X 0X)−1X 0y = Hy.

The n × n matrix H is often referred to as the hat matrix.


The residuals,
e = ŷ − y,
can also be rewritten as

e = y − X β̂ = y − Hy = (I − H)y.
11

Properties of least squares estimates

β̂ is an unbiased estimator of β

E(β̂) = E[(X 0X)−1X 0y] = (X 0X)−1X 0Xβ = β

The variance of β̂ is expressed by the covariance matrix

cov(β̂) = E{[β̂ − E(β̂)][β̂ − E(β̂)]0}


= σ 2(X 0X)−1

If we let C = (X 0X)−1, the variance of βˆj = σ 2Cjj and the


covariance between β̂i and βˆj is σ 2Cij .
12

Estimation of σ 2

As in simple linear regression, we develop an estimator of σ 2


from SSE (residual sum of squares).

SSE = (y − X β̂)0(y − X β̂)

SSE
σˆ2 = M SE =
n−p−1
As in simple linear regression σˆ2 is an unbiased estimator of
σ 2.
13

Example

Cardiovascular Health Study (CHS)

A population-based, longitudinal study of coronary heart


disease in people over 65.

Primary aim: To identify risk factors related to the onset and


course of coronary heart disease and stroke.
Secondary aim: Describe prevalence and distributions of risk
factors.
14

Scientific question

How is a person’s weight related to blood pressure?

We might expect people who are more overweight to have


higher blood pressure. However, a simple linear regression
model with weight as a predictor might be misleading. Why?

We will examine a subset of 500 of these subjects to see how


height and weight are related to blood pressure.
15

Scatterplot matrix

pairs(cbind(chs$DIABP,chs$HEIGHT,chs$WEIGHT),
labels=c("Blood Pressure", "Height", "Weight"))

140 160 180

● ●

● ●

100
● ●
● ● ●●● ●● ● ● ●● ● ●
● ●● ● ●●● ●

● ●● ● ● ● ● ● ● ●● ● ● ●
● ●●● ●● ● ●● ● ● ● ● ●●● ● ● ●● ●
● ●● ●●
●●● ● ●● ●●●●●●●● ●●
● ●● ●
● ●●
● ●

● ●●●●● ● ●
● ●● ●● ● ●

●●●

●●●
●● ● ●●● ● ● ●●

●●●● ●● ● ● ● ●
● ● ● ●● ●
● ●●● ● ● ● ●●
● ●●
● ●
●●●●● ●● ● ●● ● ●
● ● ●● ● ●●
● ●●●● ●●● ● ● ●●●

80
● ● ●
●●● ●●● ● ●●
●● ●●●●●●●
● ●
●● ● ● ●●● ●●● ● ●
●●●●●
● ●● ●●●●●●●
●● ●●●●● ●● ● ●●●
● ● ● ●●●
Blood Pressure ●●●●


●●● ●
● ●●


●●
●●


●●●
● ●
●●
●●●







●●

●●







●●●
●●●
●●●


● ●

●●●
●●


●●




●●


●●





●●


●●

● ●



●●



●●●



●●●● ● ●
●●●
●●

●● ●
● ●● ●●
●●
●●
●●
●●●
● ●●
●●●●
●●●



●●●●


●●
●●●


●●
●●●











●●
●●

●●●




●●●
●●


●●




●●●●●●

●●
●●●●
●● ●●




●●
●●

●●

● ●●
● ●

●●● ●
● ●● ●● ●● ●● ●●
●●● ● ●●●● ● ●
●●●●

● ●●● ●●●●●
●●●●●●●●


●●
●●●●●●●
●●
●●● ●●
● ● ●
● ●●● ● ● ● ●●
● ●●●●●
●●●●●
●●●
●●●
●● ●
●●●
● ● ●●● ●

● ●● ●●●
●●●

●●


●●

● ●●●



●●●●
● ●● ●

●● ●●●
● ●● ● ●●●
● ●
●●
●●
● ●●

●●●




●●

●●
● ● ●● ●
● ●
●● ●●●●● ●●●●●● ●● ●
● ●

● ●●●●●●●●
● ● ●
●●●● ●●●
● ●●●●● ●
●●●
●●●

60
● ● ● ● ●●
●● ●●●●● ● ● ●●
●● ●
● ●●●●
●● ● ●●●

● ● ● ●● ●●●●● ●● ● ● ●●
● ●●●●
● ● ●●● ●
●●●● ●●● ● ● ● ●●
●● ● ● ●
●● ● ● ●●
● ● ●● ●●● ● ● ●● ● ● ●●
● ●●● ● ●●
● ●● ●●
● ●
●●●

●●

● ●
● ●●●● ● ● ● ● ●● ●● ●● ●
● ● ● ● ●
● ●
● ●
● ●

40
●● ● ●
● ●

● ●
● ●
● ● ● ●
● ● ● ● ●●
● ● ●● ● ●● ●
● ●
●● ● ● ● ● ●● ● ●
180

● ●●● ● ● ● ● ● ●●
●●● ●●●●● ● ●● ●● ●● ● ● ●●
● ●●● ●●●
● ●● ●
● ●● ● ●● ● ●● ●
● ●●●
●●● ● ●● ●●● ●

●● ●● ●● ● ●
● ● ●●
●● ●●●
●●●


●●
●●● ● ●

● ●● ●

●●●●● ● ●● ●●●
● ●●
● ●


●●●●



●●
●●●●
●●
●●

●●●● ●
●●
●● ● ● ●
●●● ●● ● ● ● ●
● ● ● ●● ●
●●● ● ● ●
●●● ●
●●●
●● ●●
●● ●●● ●● ●●● ●

●●
●●●


●●


●●●●
●● ● ●● ● ●● ●
● ● ● ● ● ●
● ●
●●●




●●
● ●●●
●●


●●●

●●●●●
●●●








●●●

●●
●●







●●
●●
●●
●●
●●●
●●
●●







● ●


●●●
●●
●●
●●


Height ●
● ●
● ●●●
● ●●●


●●●








●●







●●















●●
●●

● ●
●●●

●●
● ●●●●●● ●●● ●




●●

●●
● ●●

●●●● ●● ● ● ● ●●● ● ● ●● ●●● ● ● ●


160

●●● ● ●
●● ●●●● ● ●●
●●●● ●

● ● ●●●● ●●● ●● ● ● ● ●●●

● ●●●
●● ●● ●●●●● ● ●
●●
●●●●

●●●
●●● ●
● ●
●●

●●

●●
●● ●●●●● ●
●●● ●● ● ● ●● ●●
●●●●
●●●

●●
●●
●●

●●● ● ●
● ●● ●●
●●●●●●● ●
●●●
●●



●● ●
●●●●● ●
● ●●
●● ●
● ●
● ●
●●
● ●
●●●●●
● ● ●● ●
●● ● ● ●
● ●●● ● ●● ●● ● ●

●●
●●●●
●●●●●●●
● ●● ●

● ●●● ●●
●● ●●● ●

●●●



●●●

●●

● ●●● ●● ●
● ●
● ●●● ●
●●●



● ● ● ●
●●●
●●●
●●


●●

●●●● ● ● ● ●
● ●● ●●● ●●
●● ●●● ●● ●● ● ●●● ●● ●
● ●●● ●● ● ● ●● ● ●●●●● ●● ●● ● ●
● ● ●● ● ● ●● ● ● ●●●

● ● ● ● ● ●●● ● ●● ● ●● ●●● ●
● ●● ● ● ●
●● ● ● ● ●
140

● ●
● ●

300
● ●

● ●

250
● ● ● ●●

● ● ● ● ● ●● ●
●● ● ●●● ● ●● ●● ●
● ● ●● ● ●●●● ●● ● ●●

● ●● ● ●● ● ● ●

●●●●●● ●● ● ● ● ●●● ● ●●●●● ●●

200
●● ● ● ● ●● ● ● ● ●
●●● ●
● ●
● ●
● ●
●●
●●





●●●
●●● ●













●●

●●






●●●●


●●●
●●●●
●●


●●●
●●

●●

●●● ●

●●●

●●


● ● ●●
●●●●
●●●● ●


●●●●

● ● ●
●●●●●●●
●●




●●

●●

●●●

●●●●
●●●●



●●
●●●
●●
●●




●●● ● ●
●●

●●


● ●
●● ●●
Weight
●● ●●●
● ● ●
●●
●●●●● ●●●●● ● ● ●
●●

●●● ●●●● ● ●●● ●
●●
●●●●● ●
●●● ●
● ●●
●●●●● ● ● ● ●●●●●●
● ●
●●
●●● ●● ●● ●●●●
●●●
● ●
● ●

● ● ●● ●
●● ●●●●●
● ●●
●●● ●●●●
● ●
●●
● ●● ●●
● ● ● ● ●● ● ●●
●● ●●
● ●
●●
● ●●
●●
●●

●●

150
● ● ●●●●●
●●●
●●
●●● ● ● ●● ●●●●● ●●
●●●● ●● ● ● ●


● ●●



●●●
● ●


● ● ●●
●●●

●●●
●●

●●●●
●●
●● ●●

●●
● ●●● ●
● ●● ● ●● ●● ●
● ●●●
●●
●●●
● ●●

●●
●●●
●●●



●●●●
●● ●● ●● ●
●●
●● ●●
● ●● ●

●●●●
● ●● ●●● ●

●●● ●●● ●● ●● ● ● ● ●●●
●●●●●●
●●●
● ●
●●●
●● ●
●● ●
●●●
●● ●
●●● ●● ●
●● ● ● ●● ●●●● ●


●●●●●●
●● ● ● ●● ●


● ●
●●

●●●● ●
●●
●●●●

● ●●
● ● ●
●● ●● ●●●
●●
●●●

●●
●●

●●


●●● ● ●●
●●
●●●●
● ●
●● ●
●●
●● ●


●●
● ● ●●



●●● ●●
●● ●●●
●●● ● ●

●●●●
●●●● ● ● ● ●
● ●●
●●●●
●● ●●

●●●●


●● ●
● ●● ●● ● ● ●● ● ●● ●●●
●●● ● ●●● ●●
● ● ●● ●● ● ● ●
●● ● ●

100
●● ● ● ●●●● ●●
● ●●

●●●
● ●● ● ●●● ●●●●
● ●●●
●●
●●●
● ●●● ● ● ●● ●●
● ● ● ●

40 60 80 100 100 150 200 250 300


16

3d scatterplot

library(scatterplot)
scatterplot3d(chs$HEIGHT,chs$WEIGHT,chs$DIABP, xlab="height", ylab="weight",
zlab="Blood pressure")

● ●
● ● ●●
80 100 120


● ● ●● ●●●● ●
● ● ● ●
● ● ●
Blood pressure

●● ●● ●● ●● ● ●● ●●
●● ● ● ●●
●●● ●● ●● ●●●●●●

● ●●


● ●

weight
● ● ●●● ● ●●
● ●
● ● ●●●●●●
●●
● ●
●● ●


●● ●


●●
●●
●●●
●●
●●
● ● ●●
●●
● ●●


●●





●●●

●●
●●



●●
●●
●●●●




●●●●●● ●
●●●
● ●
● ●●
●●
● ●
●●●●●●●●●
●●●●●●
●●●
● ● ●

●●
● ●●● ●
● 350

●●● ●
● ●● ● ●● ●●● ●●
●●● ●● ● ●●
●● ●●● ●●●●
● ● ● ●
●●●● ●●
●●●●●●●

● ●● ● ●
● ●
●●
●● ●● ●●●●●●●●
●● ●●●●●●
●●
●●●●
● ●●●
● ●
●●




● ●●
●●●
●●

●●
●●
●●
●●●



●●●
●●●●●
● ● ●●
● 300
● ●● ● ●● ●



●●
●●●●●●●●●●●●




● ●●
●●●
●●
●● ●●
●●
250
● ● ●●●●● ●●

●●
●●●

●●
●●
● ● ●

●●●

● ● ●
●●
● ● ●●● ●● ● ● ● ● 200
60

●● ● 150

●● 100
40

50
130 140 150 160 170 180 190 200

height
17
Regression models we may be interested in:

E[BPi] = β0 + β1heighti

E[BPi] = β0 + β1weighti
E[BPi] = β0 + β1heighti + β2weighti
We’ve fit models similar to the first 2, but not the 3rd.

Note: 2 observations will be omitted from the analysis


because the subjects’ weights are missing.
18

BPi = β0 + β1heighti + i
>lmht <- lm(DIABP~HEIGHT,data=chs)
>summary(lmht)

Residuals:
Min 1Q Median 3Q Max
-32.8206 -7.6509 -0.0482 7.4237 40.0141

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 51.56051 8.79011 5.866 8.18e-09 ***
HEIGHT 0.12353 0.05343 2.312 0.0212 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 11.28 on 496 degrees of freedom


Multiple R-Squared: 0.01066, Adjusted R-squared: 0.00867
F-statistic: 5.347 on 1 and 496 DF, p-value: 0.02117
19

BPi = β0 + β1weighti + i
lmwt <- lm(DIABP~WEIGHT,data=chs)
summary(lmwt)

Residuals:
Min 1Q Median 3Q Max
-30.7561 -7.4422 -0.1446 7.3281 40.1285

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 63.99253 2.50881 25.507 < 2e-16 ***
WEIGHT 0.04905 0.01534 3.198 0.00147 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 11.23 on 496 degrees of freedom


Multiple R-Squared: 0.0202, Adjusted R-squared: 0.01822
F-statistic: 10.23 on 1 and 496 DF, p-value: 0.001474
20

BPi = β0 + β1heighti + β2weighti + i


lmhtwt <- lm(DIABP~HEIGHT+WEIGHT,data=chs)
summary(lmhtwt)

Residuals:
Min 1Q Median 3Q Max
-31.3833 -7.2260 -0.2881 7.7002 39.6144

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 55.65777 8.91267 6.245 9.14e-10 ***
HEIGHT 0.05820 0.05972 0.975 0.3302
WEIGHT 0.04140 0.01723 2.403 0.0166 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 11.23 on 495 degrees of freedom


Multiple R-Squared: 0.02208, Adjusted R-squared: 0.01812
F-statistic: 5.587 on 2 and 495 DF, p-value: 0.003987
21

Dummy (indicator) variables

So far we have only dealt with predictors that are continuous.


However, we will often have variables that can take on only a
small number of levels. Examples:

• Smoking status (current/former/never)

• Race

• Sex (Male/Female)

• Treatment group in a clinical trial


22
To model the association between these predictors and a
response, we assign some sort of numerical scale to them. For
example, we could define sex as (0/1), (-1,1), (1,2). Then if

yi = β0 + β1sexi + i

and sex = 1 if female and 0 if male,

E(yi) = β0, if male

and
E(yi) = β0 + β1, if female.
What does β1 represent?
23
How is being a current smoker related to blood pressure?
BPi = β0 + β1smokeri + i
>lmsmk <- lm(chs$DIABP~smoker)
>summary(lmsmk)

Residuals:
Min 1Q Median 3Q Max
-32.1307 -8.0207 -0.1307 7.8693 41.3093

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 72.1307 0.5339 135.103 <2e-16 ***
smoker -2.8344 1.7020 -1.665 0.0965 .

Residual standard error: 11.31 on 496 degrees of freedom


Multiple R-Squared: 0.00556, Adjusted R-squared: 0.003555
F-statistic: 2.773 on 1 and 496 DF, p-value: 0.09649
24
We might also want to know if current smoking status is an
effect modifier for weight? In other words, is the relationship
between blood pressure and weight different for people who
smoke than for those who don’t smoke?

BPi = β0 +β1weighti +β2smokeri +β3weighti ×smokeri +i

How do we interpret β0, β1, β2, and β3?


25
> summary(lm(chs$DIABP~smoker*chs$WEIGHT))

Call:
lm(formula = chs$DIABP ~ smoke * chs$WEIGHT)

Residuals:
Min 1Q Median 3Q Max
-3.092e+01 -7.195e+00 4.102e-04 7.586e+00 3.986e+01

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 63.87963 2.67958 23.839 < 2e-16 ***
smoker 4.53954 7.94883 0.571 0.56819
chs$WEIGHT 0.05108 0.01626 3.141 0.00178 **
smoker:chs$WEIGHT -0.04517 0.05187 -0.871 0.38431

Residual standard error: 11.22 on 494 degrees of freedom


Multiple R-Squared: 0.02506, Adjusted R-squared: 0.01914
26



100

● ●
● ● ●
● ● ●
● ●

● ● ●
● ● ● ● ●

● ● ● ● ● ●
● ● ● ●● ●● ●
● ● ● ● ●
● ● ●
blood pressure

● ● ●● ● ● ●● ●
● ● ● ● ●● ●●● ● ●●● ● ●● ●●
● ● ● ● ●
● ● ● ●●
● ●● ● ●● ●● ● ● ● ● ● ● ●
●● ● ● ● ●● ● ● ●
80

● ● ● ●● ● ● ● ● ●
●● ● ●●●
● ● ● ● ●●●● ● ● ● ● ●●
● ●● ●
● ●●●● ● ● ● ●● ● ● ●
● ● ● ●● ● ●● ● ●
● ● ●● ● ● ● ●●● ● ● ● ●●
●● ●●
●●● ● ● ● ● ●
●● ● ● ●● ● ● ● ● ●
● ●●
● ●●● ●
●● ●●●●●● ● ●● ●
● ●●● ● ●● ● ● ●● ●●

● ● ●● ●●
●●● ● ● ● ● ● ● ●● ●
● ● ●● ●
● ●● ●●
●●●●● ● ● ● ●● ●
●● ● ●● ● ● ● ● ● ● ● ●● ●● ● ●
● ● ●● ● ● ● ●● ● ●● ● ● ●
● ●● ● ● ● ●● ●● ●● ●●● ●●● ● ● ●
● ●● ● ●●●●●
● ● ● ●●● ● ●
● ● ●●● ●●● ● ● ●
● ● ●● ● ● ● ● ● ●
●● ● ●● ● ● ● ● ● ●● ● ● ●●
● ●● ● ● ● ● ●●● ● ●● ● ●● ● ●
●● ● ● ● ● ● ●
60

● ● ● ● ● ● ●●● ● ●
● ●●● ●● ● ● ● ● ● ●
● ● ● ● ● ● ●
● ● ●
● ● ●● ● ● Smokers
● ● ● ● ● ● ● ● ●
● ● ●

● ●●● ● ●

●●
● ●●● ● Non−smokers
● ●
● ●




40

●● ●

100 150 200 250 300

weight
27

We know if a subject is a current/former/never smoker. How


should we model this relationship with blood pressure?

Option 1:

 1, never smoked

smokei = 2, f ormer smoker
 3, current smoker

BPi = β0 + β1smokei + ,
how do we interpret β1?
βˆ1 = −1.08 and se(
ˆ βˆ1) = 0.7637.
28
Option 2:
Create two dummy variables

1, never smoked
smoke1i =
0, otherwise

and 
1, f ormer smoker
smoke2i = .
0, otherwise
BPi = β0 + β1smoke1i + β2smoke2i + i
How do we interpret β0, β1 and β2?
29
> lmsmk2 <- lm(chs$DIABP~smoke1+smoke2)
> summary(lmsmk2)

Call:
lm(formula = chs$DIABP ~ smoke1 + smoke2)

Residuals:
Min 1Q Median 3Q Max
-32.2824 -7.9008 -0.2824 7.7176 41.5198

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 69.296 1.618 42.839 <2e-16 ***
smoke1TRUE 2.986 1.763 1.694 0.091 .
smoke2TRUE 2.624 1.816 1.445 0.149
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 11.32 on 495 degrees of freedom


30

Summary

So far, we’ve discussed

• The motivation for multiple linear regression models

• Matrix notation for multiple linear regression

• Least squares estimates for multiple linear regression

• Recoding factors into dummy variables

• Intrepretation of linear models


31
Next, we’ll discuss hypothesis testing in multiple linear
regression

• Overall tests : does the entire set of independent predictors


contribute significanly to the prediction of y?

• Test for addition of a single variable: does the addition of


one variable significantly improve the prediction of y over
other independent predictors already present in the model?

• Test for addition of a group of variables: does the addtion


of some group of variables improve the prediction of y over
other independent predictors alreay present in the model?

You might also like