0% found this document useful (0 votes)

39 views32 pages

Multiple Linear Regression: BIOST 515 January 15, 2004

The document summarizes multiple linear regression. It outlines the motivation for using multiple linear regression over simple linear regression, including to account for scientific questions, confounding variables, and increased precision. It then presents the multiple linear regression model in matrix notation and describes estimating model parameters using least squares and maximum likelihood. Hypothesis testing in multiple linear regression is also mentioned. An example using data from the Cardiovascular Health Study is provided to illustrate the use of multiple linear regression to examine relationships between blood pressure, height, and weight.

Uploaded by

HazemIbrahim

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

39 views32 pages

Multiple Linear Regression: BIOST 515 January 15, 2004

Uploaded by

HazemIbrahim

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 32

Lecture 4

Multiple linear regression

BIOST 515

January 15, 2004

Outline

• Motivation for the multiple regression model

• Multiple regression in matrix notation

• Least squares estimation of model parameters

• Maximum likelihood estimation of model parameters

• Hypothesis testing
2

Multiple linear regression

We are now considering the model

yi = β0 + β1xi1 + · · · + βpxip + i, (1)

i = 1, . . . , n, E[i] = 0, var(i) = σ 2 and cov(i, j ) = 0.

We will also require that p < n. Additional assumption:

• The predictors (x1, . . . , xp) are fixed and measured without

error
3

Why multiple linear regression?

Previously we’ve examined the case with one predictor and

one outcome (simple linear regression). There are a variety of
reasons we may want to include additional predictors in the
model.

• Scientific question

• Adjustment for confounding

• Gain precision
4

Scientific Question

May dictate inclusion of particular predictors

• Predictors of interest
? The scientific factor under investigation can be modeled by multiple
predictors (eg - dummy variables, polynomials, etc.)
• Effect modifiers
? The scientific question may relate to detection of effect modification
• Confounders
? The scientific question may have been stated in terms of adjusting for
known (or suspected) confounders
5

Confounding

Sometimes the scientific question of greatest interest is

confounded by associations in the data.

From KKMN, pg. 187:

In general, confounding exists if meaningfully different
interpretations of the relationship of interest result when an
extraneous variable is ignored or included in the analysis.
6

Precision

Adjusting for an additional covariate changes the standard

error of the slope estimate

• Standard error is decreased by having smaller within group

variance

• Standard error is increased by having correlations between

the predictor of interest and other covariates in the model
7

General comments on multiple regression

• Can be difficult to choose the “best” model, since many

reasonable candidates may exist

• More difficult to visualize the fitted model

• More difficult to interpret the fitted model

The model in matrix notation

y = Xβ +

   
y1 1 x11 x12 · · · x1p
 y2   1 x21 x22 · · · x2p 
y=
 ..  ,
 X=
 .. .. .. ,
.. 
yn 1 xn1 xn2 · · · xnp
   
β0 1
 β1   2 
β=
 ..  ,
 =
 .. 


βp n
9

Least Squares Estimates

S(β) = (y − Xβ)0(y − Xβ)

Least squares estimates are obtained by solving
∂S(β)
= −2X 0y + 2X 0Xβ = 0
∂β
for β. This gives us the least squares normal equations:

X 0X β̂ = X 0y.

To get the least squares estimator of β multiply each side by

(X 0X)−1 which gives

β̂ = (X 0X)−1X 0y
10

provided (X 0X)−1 exists (which it will if the regressors are

linearly independent).
The vector of fitted y-values , ŷ, corresponding to the
observed y-values y is

ŷ = X β̂ = X(X 0X)−1X 0y = Hy.

The n × n matrix H is often referred to as the hat matrix.

The residuals,
e = ŷ − y,
can also be rewritten as

e = y − X β̂ = y − Hy = (I − H)y.
11

Properties of least squares estimates

β̂ is an unbiased estimator of β

E(β̂) = E[(X 0X)−1X 0y] = (X 0X)−1X 0Xβ = β

The variance of β̂ is expressed by the covariance matrix

cov(β̂) = E{[β̂ − E(β̂)][β̂ − E(β̂)]0}

= σ 2(X 0X)−1

If we let C = (X 0X)−1, the variance of βˆj = σ 2Cjj and the

covariance between β̂i and βˆj is σ 2Cij .
12

Estimation of σ 2

As in simple linear regression, we develop an estimator of σ 2

from SSE (residual sum of squares).

SSE = (y − X β̂)0(y − X β̂)

SSE
σˆ2 = M SE =
n−p−1
As in simple linear regression σˆ2 is an unbiased estimator of
σ 2.
13

Example

Cardiovascular Health Study (CHS)

A population-based, longitudinal study of coronary heart

disease in people over 65.

Primary aim: To identify risk factors related to the onset and

course of coronary heart disease and stroke.
Secondary aim: Describe prevalence and distributions of risk
factors.
14

Scientific question

How is a person’s weight related to blood pressure?

We might expect people who are more overweight to have

higher blood pressure. However, a simple linear regression
model with weight as a predictor might be misleading. Why?

We will examine a subset of 500 of these subjects to see how

height and weight are related to blood pressure.
15

Scatterplot matrix

pairs(cbind(chs$DIABP,chs$HEIGHT,chs$WEIGHT),
labels=c("Blood Pressure", "Height", "Weight"))

140 160 180

● ●

100
● ●
● ● ●●● ●● ● ● ●● ● ●
● ●● ● ●●● ●
●
● ●● ● ● ● ● ● ● ●● ● ● ●
● ●●● ●● ● ●● ● ● ● ● ●●● ● ● ●● ●
● ●● ●●
●●● ● ●● ●●●●●●●● ●●
● ●● ●
● ●●
● ●
●
● ●●●●● ● ●
● ●● ●● ● ●
●
●●●
●
●●●
●● ● ●●● ● ● ●●
●
●●●● ●● ● ● ● ●
● ● ● ●● ●
● ●●● ● ● ● ●●
● ●●
● ●
●●●●● ●● ● ●● ● ●
● ● ●● ● ●●
● ●●●● ●●● ● ● ●●●

80
● ● ●
●●● ●●● ● ●●
●● ●●●●●●●
● ●
●● ● ● ●●● ●●● ● ●
●●●●●
● ●● ●●●●●●●
●● ●●●●● ●● ● ●●●
● ● ● ●●●
Blood Pressure ●●●●
●
●
●●● ●
● ●●
●
●
●●
●●
●
●
●●●
● ●
●●
●●●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●●●
●●●
●●●
●
●
● ●
●
●●●
●●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●●
●
● ●
●
●
●
●●
●
●
●
●●●
●
●
●
●●●● ● ●
●●●
●●
●
●● ●
● ●● ●●
●●
●●
●●
●●●
● ●●
●●●●
●●●
●
●
●
●●●●
●
●
●●
●●●
●
●
●●
●●●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●●●
●
●
●
●
●●●
●●
●
●
●●
●
●
●
●
●●●●●●
●
●●
●●●●
●● ●●
●
●
●
●
●●
●●
●
●●
●
● ●●
● ●

60
● ● ● ● ●●
●● ●●●●● ● ● ●●
●● ●
● ●●●●
●● ● ●●●
●
● ● ● ●● ●●●●● ●● ● ● ●●
● ●●●●
● ● ●●● ●
●●●● ●●● ● ● ● ●●
●● ● ● ●
●● ● ● ●●
● ● ●● ●●● ● ● ●● ● ● ●●
● ●●● ● ●●
● ●● ●●
● ●
●●●
●
●●
●
● ●
● ●●●● ● ● ● ● ●● ●● ●● ●
● ● ● ● ●
● ●
● ●
● ●

40
●● ● ●
● ●

● ●
● ●
● ● ● ●
● ● ● ● ●●
● ● ●● ● ●● ●
● ●
●● ● ● ● ● ●● ● ●
180

● ●●● ● ● ● ● ● ●●
●●● ●●●●● ● ●● ●● ●● ● ● ●●
● ●●● ●●●
● ●● ●
● ●● ● ●● ● ●● ●
● ●●●
●●● ● ●● ●●● ●
●
●● ●● ●● ● ●
● ● ●●
●● ●●●
●●●
●
●
●●
●●● ● ●
●
● ●● ●
●
●●●●● ● ●● ●●●
● ●●
● ●
●
●
●●●●
●
●
●
●●
●●●●
●●
●●
●
●●●● ●
●●
●● ● ● ●
●●● ●● ● ● ● ●
● ● ● ●● ●
●●● ● ● ●
●●● ●
●●●
●● ●●
●● ●●● ●● ●●● ●
●
●●
●●●
●
●
●●
●
●
●●●●
●● ● ●● ● ●● ●
● ● ● ● ● ●
● ●
●●●
●
●
●
●
●●
● ●●●
●●
●
●
●●●
●
●●●●●
●●●
●
●
●
●
●
●
●
●
●●●
●
●●
●●
●
●
●
●
●
●
●
●●
●●
●●
●●
●●●
●●
●●
●
●
●
●
●
●
●
● ●
●
●
●●●
●●
●●
●●
●
●
Height ●
● ●
● ●●●
● ●●●
●
●
●●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
● ●
●●●
●
●●
● ●●●●●● ●●● ●
●
●
●
●
●●
●
●●
● ●●
●

●●●● ●● ● ● ● ●●● ● ● ●● ●●● ● ● ●

160

●●● ● ●
●● ●●●● ● ●●
●●●● ●
●
● ● ●●●● ●●● ●● ● ● ● ●●●
●
● ●●●
●● ●● ●●●●● ● ●
●●
●●●●
●
●●●
●●● ●
● ●
●●
●
●●
●
●●
●● ●●●●● ●
●●● ●● ● ● ●● ●●
●●●●
●●●
●
●●
●●
●●
●
●●● ● ●
● ●● ●●
●●●●●●● ●
●●●
●●
●
●
●
●● ●
●●●●● ●
● ●●
●● ●
● ●
● ●
●●
● ●
●●●●●
● ● ●● ●
●● ● ● ●
● ●●● ● ●● ●● ● ●
●
●●
●●●●
●●●●●●●
● ●● ●
●
● ●●● ●●
●● ●●● ●
●
●●●
●
●
●
●●●
●
●●
●
● ●●● ●● ●
● ●
● ●●● ●
●●●
●
●
●
● ● ● ●
●●●
●●●
●●
●
●
●●
●
●●●● ● ● ● ●
● ●● ●●● ●●
●● ●●● ●● ●● ● ●●● ●● ●
● ●●● ●● ● ● ●● ● ●●●●● ●● ●● ● ●
● ● ●● ● ● ●● ● ● ●●●
●
● ● ● ● ● ●●● ● ●● ● ●● ●●● ●
● ●● ● ● ●
●● ● ● ● ●
140

● ●
● ●

300
● ●

● ●

250
● ● ● ●●
●
● ● ● ● ● ●● ●
●● ● ●●● ● ●● ●● ●
● ● ●● ● ●●●● ●● ● ●●
●
● ●● ● ●● ● ● ●
●
●●●●●● ●● ● ● ● ●●● ● ●●●●● ●●

200
●● ● ● ● ●● ● ● ● ●
●●● ●
● ●
● ●
● ●
●●
●●
●
●
●
●
●
●●●
●●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●

●●
●
●
●
●
●
●
●●●●
●
●
●●●
●●●●
●●
●
●
●●●
●●
●
●●
●
●●● ●
●
●●●
●
●●
●
●
● ● ●●
●●●●
●●●● ●
●
●
●●●●
●
● ● ●
●●●●●●●
●●
●
●
●
●

●
●●
●
●●
●
●●●
●
●●●●
●●●●
●
●
●
●●
●●●
●●
●●
●
●
●
●
●●● ● ●
●●
●
●●
●
●
● ●
●● ●●
Weight
●● ●●●
● ● ●
●●
●●●●● ●●●●● ● ● ●
●●
●
●●● ●●●● ● ●●● ●
●●
●●●●● ●
●●● ●
● ●●
●●●●● ● ● ● ●●●●●●
● ●
●●
●●● ●● ●● ●●●●
●●●
● ●
● ●
●
● ● ●● ●
●● ●●●●●
● ●●
●●● ●●●●
● ●
●●
● ●● ●●
● ● ● ● ●● ● ●●
●● ●●
● ●
●●
● ●●
●●
●●
●
●●

150
● ● ●●●●●
●●●
●●
●●● ● ● ●● ●●●●● ●●
●●●● ●● ● ● ●
●
●
● ●●
●
●
●
●●●
● ●
●
●
● ● ●●
●●●
●
●●●
●●
●
●●●●
●●
●● ●●
●
●●
● ●●● ●
● ●● ● ●● ●● ●
● ●●●
●●
●●●
● ●●
●
●●
●●●
●●●
●
●
●
●●●●
●● ●● ●● ●
●●
●● ●●
● ●● ●
●
●●●●
● ●● ●●● ●
●
●●● ●●● ●● ●● ● ● ● ●●●
●●●●●●
●●●
● ●
●●●
●● ●
●● ●
●●●
●● ●
●●● ●● ●
●● ● ● ●● ●●●● ●
●
●
●●●●●●
●● ● ● ●● ●
●
●
● ●
●●
●
●●●● ●
●●
●●●●
●
● ●●
● ● ●
●● ●● ●●●
●●
●●●
●
●●
●●
●
●●
●
●
●●● ● ●●
●●
●●●●
● ●
●● ●
●●
●● ●
●
●
●●
● ● ●●
●
●
●
●●● ●●
●● ●●●
●●● ● ●
●
●●●●
●●●● ● ● ● ●
● ●●
●●●●
●● ●●
●
●●●●
●
●
●● ●
● ●● ●● ● ● ●● ● ●● ●●●
●●● ● ●●● ●●
● ● ●● ●● ● ● ●
●● ● ●

100
●● ● ● ●●●● ●●
● ●●
●
●●●
● ●● ● ●●● ●●●●
● ●●●
●●
●●●
● ●●● ● ● ●● ●●
● ● ● ●

40 60 80 100 100 150 200 250 300

3d scatterplot

library(scatterplot)
scatterplot3d(chs$HEIGHT,chs$WEIGHT,chs$DIABP, xlab="height", ylab="weight",
zlab="Blood pressure")

● ●
● ● ●●
80 100 120

●
● ● ●● ●●●● ●
● ● ● ●
● ● ●
Blood pressure

●● ●● ●● ●● ● ●● ●●
●● ● ● ●●
●●● ●● ●● ●●●●●●
●
● ●●
●
●
● ●
●

weight
● ● ●●● ● ●●
● ●
● ● ●●●●●●
●●
● ●
●● ●
●
●
●● ●
●
●
●●
●●
●●●
●●
●●
● ● ●●
●●
● ●●
●
●
●●
●
●
●
●
●
●●●
●
●●
●●
●
●
●
●●
●●
●●●●
●
●
●
●
●●●●●● ●
●●●
● ●
● ●●
●●
● ●
●●●●●●●●●
●●●●●●
●●●
● ● ●
●
●●
● ●●● ●
● 350
●
●●● ●
● ●● ● ●● ●●● ●●
●●● ●● ● ●●
●● ●●● ●●●●
● ● ● ●
●●●● ●●
●●●●●●●
●
● ●● ● ●
● ●
●●
●● ●● ●●●●●●●●
●● ●●●●●●
●●
●●●●
● ●●●
● ●
●●
●
●
●
●
● ●●
●●●
●●
●
●●
●●
●●
●●●
●
●
●
●●●
●●●●●
● ● ●●
● 300
● ●● ● ●● ●
●
●
●
●●
●●●●●●●●●●●●
●
●
●
●
● ●●
●●●
●●
●● ●●
●●
250
● ● ●●●●● ●●
●
●●
●●●
●
●●
●●
● ● ●
●
●●●
●
● ● ●
●●
● ● ●●● ●● ● ● ● ● 200
60

●● ● 150
●
●● 100
40

50
130 140 150 160 170 180 190 200

height
17
Regression models we may be interested in:

E[BPi] = β0 + β1heighti

E[BPi] = β0 + β1weighti
E[BPi] = β0 + β1heighti + β2weighti
We’ve fit models similar to the first 2, but not the 3rd.

Note: 2 observations will be omitted from the analysis

because the subjects’ weights are missing.
18

BPi = β0 + β1heighti + i
>lmht <- lm(DIABP~HEIGHT,data=chs)
>summary(lmht)

Residuals:
Min 1Q Median 3Q Max
-32.8206 -7.6509 -0.0482 7.4237 40.0141

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 51.56051 8.79011 5.866 8.18e-09 ***
HEIGHT 0.12353 0.05343 2.312 0.0212 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 11.28 on 496 degrees of freedom

Multiple R-Squared: 0.01066, Adjusted R-squared: 0.00867
F-statistic: 5.347 on 1 and 496 DF, p-value: 0.02117
19

BPi = β0 + β1weighti + i
lmwt <- lm(DIABP~WEIGHT,data=chs)
summary(lmwt)

Residuals:
Min 1Q Median 3Q Max
-30.7561 -7.4422 -0.1446 7.3281 40.1285

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 63.99253 2.50881 25.507 < 2e-16 ***
WEIGHT 0.04905 0.01534 3.198 0.00147 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 11.23 on 496 degrees of freedom

Multiple R-Squared: 0.0202, Adjusted R-squared: 0.01822
F-statistic: 10.23 on 1 and 496 DF, p-value: 0.001474
20

BPi = β0 + β1heighti + β2weighti + i

lmhtwt <- lm(DIABP~HEIGHT+WEIGHT,data=chs)
summary(lmhtwt)

Residuals:
Min 1Q Median 3Q Max
-31.3833 -7.2260 -0.2881 7.7002 39.6144

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 55.65777 8.91267 6.245 9.14e-10 ***
HEIGHT 0.05820 0.05972 0.975 0.3302
WEIGHT 0.04140 0.01723 2.403 0.0166 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 11.23 on 495 degrees of freedom

Multiple R-Squared: 0.02208, Adjusted R-squared: 0.01812
F-statistic: 5.587 on 2 and 495 DF, p-value: 0.003987
21

Dummy (indicator) variables

So far we have only dealt with predictors that are continuous.

However, we will often have variables that can take on only a
small number of levels. Examples:

• Smoking status (current/former/never)

• Race

• Sex (Male/Female)

• Treatment group in a clinical trial

22
To model the association between these predictors and a
response, we assign some sort of numerical scale to them. For
example, we could define sex as (0/1), (-1,1), (1,2). Then if

yi = β0 + β1sexi + i

and sex = 1 if female and 0 if male,

E(yi) = β0, if male

and
E(yi) = β0 + β1, if female.
What does β1 represent?
23
How is being a current smoker related to blood pressure?
BPi = β0 + β1smokeri + i
>lmsmk <- lm(chs$DIABP~smoker)
>summary(lmsmk)

Residuals:
Min 1Q Median 3Q Max
-32.1307 -8.0207 -0.1307 7.8693 41.3093

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 72.1307 0.5339 135.103 <2e-16 ***
smoker -2.8344 1.7020 -1.665 0.0965 .

Residual standard error: 11.31 on 496 degrees of freedom

Multiple R-Squared: 0.00556, Adjusted R-squared: 0.003555
F-statistic: 2.773 on 1 and 496 DF, p-value: 0.09649
24
We might also want to know if current smoking status is an
effect modifier for weight? In other words, is the relationship
between blood pressure and weight different for people who
smoke than for those who don’t smoke?

BPi = β0 +β1weighti +β2smokeri +β3weighti ×smokeri +i

How do we interpret β0, β1, β2, and β3?

25
> summary(lm(chs$DIABP~smoker*chs$WEIGHT))

Call:
lm(formula = chs$DIABP ~ smoke * chs$WEIGHT)

Residuals:
Min 1Q Median 3Q Max
-3.092e+01 -7.195e+00 4.102e-04 7.586e+00 3.986e+01

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 63.87963 2.67958 23.839 < 2e-16 ***
smoker 4.53954 7.94883 0.571 0.56819
chs$WEIGHT 0.05108 0.01626 3.141 0.00178 **
smoker:chs$WEIGHT -0.04517 0.05187 -0.871 0.38431

Residual standard error: 11.22 on 494 degrees of freedom

Multiple R-Squared: 0.02506, Adjusted R-squared: 0.01914
26

●
●
100

● ●
● ● ●
● ● ●
● ●
●
● ● ●
● ● ● ● ●
●
● ● ● ● ● ●
● ● ● ●● ●● ●
● ● ● ● ●
● ● ●
blood pressure

● ● ●● ● ● ●● ●
● ● ● ● ●● ●●● ● ●●● ● ●● ●●
● ● ● ● ●
● ● ● ●●
● ●● ● ●● ●● ● ● ● ● ● ● ●
●● ● ● ● ●● ● ● ●
80

● ● ● ●● ● ● ● ● ●
●● ● ●●●
● ● ● ● ●●●● ● ● ● ● ●●
● ●● ●
● ●●●● ● ● ● ●● ● ● ●
● ● ● ●● ● ●● ● ●
● ● ●● ● ● ● ●●● ● ● ● ●●
●● ●●
●●● ● ● ● ● ●
●● ● ● ●● ● ● ● ● ●
● ●●
● ●●● ●
●● ●●●●●● ● ●● ●
● ●●● ● ●● ● ● ●● ●●
●
● ● ●● ●●
●●● ● ● ● ● ● ● ●● ●
● ● ●● ●
● ●● ●●
●●●●● ● ● ● ●● ●
●● ● ●● ● ● ● ● ● ● ● ●● ●● ● ●
● ● ●● ● ● ● ●● ● ●● ● ● ●
● ●● ● ● ● ●● ●● ●● ●●● ●●● ● ● ●
● ●● ● ●●●●●
● ● ● ●●● ● ●
● ● ●●● ●●● ● ● ●
● ● ●● ● ● ● ● ● ●
●● ● ●● ● ● ● ● ● ●● ● ● ●●
● ●● ● ● ● ● ●●● ● ●● ● ●● ● ●
●● ● ● ● ● ● ●
60

● ● ● ● ● ● ●●● ● ●
● ●●● ●● ● ● ● ● ● ●
● ● ● ● ● ● ●
● ● ●
● ● ●● ● ● Smokers
● ● ● ● ● ● ● ● ●
● ● ●
●
● ●●● ● ●
●
●●
● ●●● ● Non−smokers
● ●
● ●
●
●
●
●
40

●● ●

100 150 200 250 300

weight
27

We know if a subject is a current/former/never smoker. How

should we model this relationship with blood pressure?

Option 1:

 1, never smoked

smokei = 2, f ormer smoker
 3, current smoker


BPi = β0 + β1smokei + ,
how do we interpret β1?
βˆ1 = −1.08 and se(
ˆ βˆ1) = 0.7637.
28
Option 2:
Create two dummy variables

1, never smoked
smoke1i =
0, otherwise

and
1, f ormer smoker
smoke2i = .
0, otherwise
BPi = β0 + β1smoke1i + β2smoke2i + i
How do we interpret β0, β1 and β2?
29
> lmsmk2 <- lm(chs$DIABP~smoke1+smoke2)
> summary(lmsmk2)

Call:
lm(formula = chs$DIABP ~ smoke1 + smoke2)

Residuals:
Min 1Q Median 3Q Max
-32.2824 -7.9008 -0.2824 7.7176 41.5198

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 69.296 1.618 42.839 <2e-16 ***
smoke1TRUE 2.986 1.763 1.694 0.091 .
smoke2TRUE 2.624 1.816 1.445 0.149
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 11.32 on 495 degrees of freedom

Summary

So far, we’ve discussed

• The motivation for multiple linear regression models

• Matrix notation for multiple linear regression

• Least squares estimates for multiple linear regression

• Recoding factors into dummy variables

• Intrepretation of linear models

31
Next, we’ll discuss hypothesis testing in multiple linear
regression

• Overall tests : does the entire set of independent predictors

contribute significanly to the prediction of y?

• Test for addition of a single variable: does the addition of

one variable significantly improve the prediction of y over
other independent predictors already present in the model?

• Test for addition of a group of variables: does the addtion

of some group of variables improve the prediction of y over
other independent predictors alreay present in the model?

RSM1282-2025-Session 6-Multiple Regression POST (1)
No ratings yet
RSM1282-2025-Session 6-Multiple Regression POST (1)
84 pages
28sici 291097 0363 2820000115 2932 3A1 3C105 3A 3aaid fld938 3e3.0.co 3B2 X
No ratings yet
28sici 291097 0363 2820000115 2932 3A1 3C105 3A 3aaid fld938 3e3.0.co 3B2 X
19 pages
EVSC 445 Week 11
No ratings yet
EVSC 445 Week 11
40 pages
Block 3
No ratings yet
Block 3
45 pages
lecture_8
No ratings yet
lecture_8
29 pages
Correlation, Simple Linear Regression and Multiple Linear Regression Practice
No ratings yet
Correlation, Simple Linear Regression and Multiple Linear Regression Practice
50 pages
Regression
No ratings yet
Regression
44 pages
10354121
No ratings yet
10354121
78 pages
Multiple Linear Regression Model by Jeevan Bista[1]
No ratings yet
Multiple Linear Regression Model by Jeevan Bista[1]
16 pages
MLR - 2023
No ratings yet
MLR - 2023
18 pages
reg
No ratings yet
reg
110 pages
Topic 4
No ratings yet
Topic 4
15 pages
Lecture 12 - Adv. Correlation and Multiple Regression
No ratings yet
Lecture 12 - Adv. Correlation and Multiple Regression
32 pages
2 Multiple Linear Regression I (1)
No ratings yet
2 Multiple Linear Regression I (1)
9 pages
Notes 6
No ratings yet
Notes 6
26 pages
15Multiple Linear Regression
No ratings yet
15Multiple Linear Regression
168 pages
Notes 8-1
No ratings yet
Notes 8-1
28 pages
Chapter 11
No ratings yet
Chapter 11
18 pages
Multiple Linear Regression & Nonlinear Regression Models
No ratings yet
Multiple Linear Regression & Nonlinear Regression Models
51 pages
Multi Variate Regression
No ratings yet
Multi Variate Regression
52 pages
Lect6 Math231
No ratings yet
Lect6 Math231
38 pages
Ch3slides Multiple Linear Regression
No ratings yet
Ch3slides Multiple Linear Regression
61 pages
Chapter 2
No ratings yet
Chapter 2
19 pages
6 Multiple Regression
No ratings yet
6 Multiple Regression
26 pages
Module01 LinearRegression
No ratings yet
Module01 LinearRegression
41 pages
11-Multi-layer Perceptron, Feed-forward Network, Feedback Network-05-08-2024
No ratings yet
11-Multi-layer Perceptron, Feed-forward Network, Feedback Network-05-08-2024
11 pages
Mult Regression
No ratings yet
Mult Regression
28 pages
DAA_2.2-Mincost Spanning Tree
No ratings yet
DAA_2.2-Mincost Spanning Tree
29 pages
Linear Model
No ratings yet
Linear Model
11 pages
Lecture 15: Diagnostics and Inference For Multiple Linear Regression 1 Review
No ratings yet
Lecture 15: Diagnostics and Inference For Multiple Linear Regression 1 Review
7 pages
Lecture 14: Multiple Linear Regression 1 Review of Simple Linear Regression in Matrix Form
No ratings yet
Lecture 14: Multiple Linear Regression 1 Review of Simple Linear Regression in Matrix Form
7 pages
4 - Multiple Linear Regressions
No ratings yet
4 - Multiple Linear Regressions
61 pages
C2 English
No ratings yet
C2 English
34 pages
5
No ratings yet
5
23 pages
120.508 Module 8 Multiple Regression (PDF Full Page Color)
No ratings yet
120.508 Module 8 Multiple Regression (PDF Full Page Color)
52 pages
Chapter 3 Multiple Linear Regression: Ray-Bing Chen Institute of Statistics National University of Kaohsiung
No ratings yet
Chapter 3 Multiple Linear Regression: Ray-Bing Chen Institute of Statistics National University of Kaohsiung
45 pages
Module01.1 LinearRegression
No ratings yet
Module01.1 LinearRegression
32 pages
BS Classes V2
No ratings yet
BS Classes V2
70 pages
Stat 353 Study Guide
No ratings yet
Stat 353 Study Guide
44 pages
Multiple Linear Reegression
No ratings yet
Multiple Linear Reegression
21 pages
03-Pelanggaran Asumsi Klasik (Normal-Kolinier) - BEA
No ratings yet
03-Pelanggaran Asumsi Klasik (Normal-Kolinier) - BEA
16 pages
Sample Questions paper
No ratings yet
Sample Questions paper
2 pages
ADM2304 Multiple Regression Dr. Suren Phansalker
No ratings yet
ADM2304 Multiple Regression Dr. Suren Phansalker
12 pages
DSA s23
No ratings yet
DSA s23
2 pages
Multiple Regression
100% (1)
Multiple Regression
100 pages
MAS316/Math352 Regression Analysis: 1 Multiple Linear Regression Models
No ratings yet
MAS316/Math352 Regression Analysis: 1 Multiple Linear Regression Models
12 pages
Multiple Regression
No ratings yet
Multiple Regression
32 pages
Linear and Exponential Function Investigation
No ratings yet
Linear and Exponential Function Investigation
2 pages
Linera Regression II PDF
No ratings yet
Linera Regression II PDF
14 pages
Multiple Regression Example PDF
No ratings yet
Multiple Regression Example PDF
5 pages
AKTU 3rd CSE - Watermark
No ratings yet
AKTU 3rd CSE - Watermark
35 pages
2024 Chapter 1
No ratings yet
2024 Chapter 1
8 pages
UKP6053 - L8 Multiple Regression
100% (2)
UKP6053 - L8 Multiple Regression
105 pages
Multiple Regression Analysis
No ratings yet
Multiple Regression Analysis
8 pages
Multiple Regression Model - Matrix Form
No ratings yet
Multiple Regression Model - Matrix Form
22 pages
Resume_public
No ratings yet
Resume_public
1 page
Linear Regression
100% (2)
Linear Regression
228 pages
Resnet50 Summary
No ratings yet
Resnet50 Summary
4 pages
Game Theory (CBCS)
No ratings yet
Game Theory (CBCS)
29 pages
Math644 - Chapter 1 - Part2 PDF
No ratings yet
Math644 - Chapter 1 - Part2 PDF
14 pages
Multiple Linear Regression
No ratings yet
Multiple Linear Regression
18 pages
3 Regression Diagnostics
100% (1)
3 Regression Diagnostics
53 pages
Multiple Regression Analysis 1
No ratings yet
Multiple Regression Analysis 1
57 pages
2b Multiple Linear Regression
No ratings yet
2b Multiple Linear Regression
14 pages
Regression III: Advanced Methods: William G. Jacoby Department of Political Science
No ratings yet
Regression III: Advanced Methods: William G. Jacoby Department of Political Science
21 pages
Computer Security (EITA25) - Report For Project 2
No ratings yet
Computer Security (EITA25) - Report For Project 2
13 pages
Assignment 3 PROGRAMMING
No ratings yet
Assignment 3 PROGRAMMING
7 pages
Multiple Regression (Compatibility Mode)
No ratings yet
Multiple Regression (Compatibility Mode)
24 pages
Lec12 PDF
No ratings yet
Lec12 PDF
39 pages
Mlr-I Practical 4.2
No ratings yet
Mlr-I Practical 4.2
2 pages
Chapter 4. Data Plots Peter Smith: Required Libraries
No ratings yet
Chapter 4. Data Plots Peter Smith: Required Libraries
9 pages
Practice Problem Set 3: OA4201 Nonlinear Programming
No ratings yet
Practice Problem Set 3: OA4201 Nonlinear Programming
6 pages
An Overview of Categorization Techniques: B. Mahalakshmi, Dr. K. Duraiswamy
No ratings yet
An Overview of Categorization Techniques: B. Mahalakshmi, Dr. K. Duraiswamy
7 pages
Assignment 02 Genetic Algorithm
No ratings yet
Assignment 02 Genetic Algorithm
2 pages
4-Bit Booth Multiplier
No ratings yet
4-Bit Booth Multiplier
2 pages
Algorithms and Flowcharts
100% (1)
Algorithms and Flowcharts
34 pages
Chapter 5 Warm Ups
No ratings yet
Chapter 5 Warm Ups
3 pages
2019 Dec. MA202-G - Ktu Qbank
No ratings yet
2019 Dec. MA202-G - Ktu Qbank
3 pages
Yapı Analizi
No ratings yet
Yapı Analizi
5 pages
Generalized Fermat Equation
From Everand
Generalized Fermat Equation
Ran Van Vo
No ratings yet
Principal Component Analysis Notes : Info
No ratings yet
Principal Component Analysis Notes : Info
22 pages
Solutions To Integration by Partial Fractions
No ratings yet
Solutions To Integration by Partial Fractions
22 pages
Graph Theory - Solutions To Problem Set 6
No ratings yet
Graph Theory - Solutions To Problem Set 6
3 pages
Inverted Pendulum
No ratings yet
Inverted Pendulum
73 pages
Finite Differerence BCs
No ratings yet
Finite Differerence BCs
3 pages
Mathematical Analysis 1: theory and solved exercises
From Everand
Mathematical Analysis 1: theory and solved exercises
Alessio Mangoni
5/5 (1)
DSA by Shradha Didi & Aman Bhaiya
No ratings yet
DSA by Shradha Didi & Aman Bhaiya
7 pages
Applications of Derivatives Errors and Approximation (Calculus) Mathematics Question Bank
From Everand
Applications of Derivatives Errors and Approximation (Calculus) Mathematics Question Bank
Mohmmad Khaja Shareef
No ratings yet
Shortcuts to College Calculus Refreshment Kit
From Everand
Shortcuts to College Calculus Refreshment Kit
Juan Acevedo
No ratings yet
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)

Multiple Linear Regression: BIOST 515 January 15, 2004

Uploaded by

Multiple Linear Regression: BIOST 515 January 15, 2004

Uploaded by

Lecture 4

Multiple linear regression

January 15, 2004

• Motivation for the multiple regression model

• Multiple regression in matrix notation

• Least squares estimation of model parameters

• Maximum likelihood estimation of model parameters

Multiple linear regression

We are now considering the model

yi = β0 + β1xi1 + · · · + βpxip + i, (1)

i = 1, . . . , n, E[i] = 0, var(i) = σ 2 and cov(i, j ) = 0.

• The predictors (x1, . . . , xp) are fixed and measured without

Why multiple linear regression?

Previously we’ve examined the case with one predictor and

• Adjustment for confounding

May dictate inclusion of particular predictors

Sometimes the scientific question of greatest interest is

From KKMN, pg. 187:

Adjusting for an additional covariate changes the standard

• Standard error is decreased by having smaller within group

• Standard error is increased by having correlations between

General comments on multiple regression

• Can be difficult to choose the “best” model, since many

• More difficult to visualize the fitted model

• More difficult to interpret the fitted model

The model in matrix notation

Least Squares Estimates

S(β) = (y − Xβ)0(y − Xβ)

To get the least squares estimator of β multiply each side by

provided (X 0X)−1 exists (which it will if the regressors are

ŷ = X β̂ = X(X 0X)−1X 0y = Hy.

The n × n matrix H is often referred to as the hat matrix.

Properties of least squares estimates

E(β̂) = E[(X 0X)−1X 0y] = (X 0X)−1X 0Xβ = β

The variance of β̂ is expressed by the covariance matrix

cov(β̂) = E{[β̂ − E(β̂)][β̂ − E(β̂)]0}

If we let C = (X 0X)−1, the variance of βˆj = σ 2Cjj and the

As in simple linear regression, we develop an estimator of σ 2

SSE = (y − X β̂)0(y − X β̂)

Cardiovascular Health Study (CHS)

A population-based, longitudinal study of coronary heart

Primary aim: To identify risk factors related to the onset and

How is a person’s weight related to blood pressure?

We might expect people who are more overweight to have

We will examine a subset of 500 of these subjects to see how

140 160 180

●●●● ●● ● ● ● ●●● ● ● ●● ●●● ● ● ●

40 60 80 100 100 150 200 250 300

Note: 2 observations will be omitted from the analysis

Residual standard error: 11.28 on 496 degrees of freedom

Residual standard error: 11.23 on 496 degrees of freedom

BPi = β0 + β1heighti + β2weighti + i

Residual standard error: 11.23 on 495 degrees of freedom

Dummy (indicator) variables

So far we have only dealt with predictors that are continuous.

• Smoking status (current/former/never)

• Treatment group in a clinical trial

and sex = 1 if female and 0 if male,

E(yi) = β0, if male

Residual standard error: 11.31 on 496 degrees of freedom

BPi = β0 +β1weighti +β2smokeri +β3weighti ×smokeri +i

How do we interpret β0, β1, β2, and β3?

Residual standard error: 11.22 on 494 degrees of freedom

100 150 200 250 300

We know if a subject is a current/former/never smoker. How

Residual standard error: 11.32 on 495 degrees of freedom

So far, we’ve discussed

• The motivation for multiple linear regression models

• Matrix notation for multiple linear regression

• Least squares estimates for multiple linear regression

• Recoding factors into dummy variables

• Intrepretation of linear models

• Overall tests : does the entire set of independent predictors

• Test for addition of a single variable: does the addition of

• Test for addition of a group of variables: does the addtion

You might also like

yi = β0 + β1xi1 + · · · + βpxip + i, (1)

i = 1, . . . , n, E[i] = 0, var(i) = σ 2 and cov(i, j ) = 0.

BPi = β0 + β1heighti + β2weighti + i

BPi = β0 +β1weighti +β2smokeri +β3weighti ×smokeri +i