0% found this document useful (0 votes)

12 views15 pages

Lecture 2 Multivariate Linear Regression Models

Regression analysis is a statistical method used to predict values of dependent variables based on independent variables and assess their effects. The classical linear regression model expresses the response variable as a linear combination of predictor variables plus an error term, with the goal of minimizing the sum of squared differences between observed and predicted values. The least squares estimation method is employed to determine the regression coefficients that best fit the data.

Uploaded by

Arnika Tabassum

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views15 pages

Lecture 2 Multivariate Linear Regression Models

Uploaded by

Arnika Tabassum

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

Multivariate Linear Regression Models

What is Regression Analysis?

➢ Regression analysis is the statistical methodology for predicting values of
one or more response (dependent) variables from a collection of predictor
(independent) variable values.
➢ It can also be used for assessing the effects of the predictor variables on the
response variable/variables.
➢ We first discuss the multiple regression model for the prediction of a single
response/dependent variable.
➢ This model is then generalized to handle the prediction of several
dependent variables.
➢ As the name implies, multivariate regression is a technique that estimates
a single regression model with more than one outcome/dependent
variable.
➢ When there is more than one predictor variable in a multivariate
regression model, the model is a multivariate multiple regression.

The Classical Linear Regression Model

➢ Let 𝑧1 , 𝑧2 , … … … … … . . , 𝑧𝑟 be predictor variables thought to be related to a
response variable Y. For example, with r = 4, we might have

Y = current market value of home

𝑧1 = square feet of living area
𝑧2 = location
𝑧3 = assessed value last year
𝑧4 = quality of construction (price per square foot)

➢ The classical linear regression model states that

✓ Y is composed of a mean, which depends in a continuous manner
on the 𝒛𝒊 ′𝒔, and
✓ a random error 𝜺, which
o accounts for measurement error and
o the effects of other variables not explicitly considered in the
model.
➢ The values of the predictor variables recorded from the experiment or set
by the investigator are treated as fixed.
➢ The error (and hence the response) is viewed as a random variable whose
behavior is characterized by a set of distributional assumptions.

1
➢ Specifically, the linear regression model with a single response takes the
form
𝑌 = 𝛽0 + 𝛽1 𝑧1 + 𝛽2 𝑧2 + ⋯ … … … … … . . +𝛽𝑟 𝑧𝑟 + 𝜀
[Response] = [mean (dependent on 𝑧1 , 𝑧2 , … … … … … . . , 𝑧𝑟 )] + [𝑒𝑟𝑟𝑜𝑟]

➢ The term "linear" refers to the fact that the mean is a linear function of
the unknown parameters 𝛽0 , 𝛽1 , … … … … . . , 𝛽𝑟 .
➢
➢ With n independent observations on Y and the associated values of 𝑧𝑖 ,
✓ the complete model becomes
𝑌1 = 𝛽0 + 𝛽1 𝑧11 + 𝛽2 𝑧12 + ⋯ … … … + 𝛽𝑟 𝑧1𝑟 + 𝜀1
𝑌2 = 𝛽0 + 𝛽1 𝑧21 + 𝛽2 𝑧22 + ⋯ … … … + 𝛽𝑟 𝑧2𝑟 + 𝜀2 (1)
⋮ ⋮ ⋮
𝑌𝑛 = 𝛽0 + 𝛽1 𝑧𝑛1 + 𝛽2 𝑧𝑛2 + ⋯ … … … + 𝛽𝑟 𝑧𝑛𝑟 + 𝜀𝑛
where the error terms are assumed to have the following properties:
i. 𝐸(𝜀𝑖 ) = 0;
ii. 𝑉𝑎𝑟(𝜀𝑖 ) = 𝜎 2 (𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡); 𝑎𝑛𝑑 (2)
iii. 𝐶𝑜𝑣(𝜀𝑖 , 𝜀𝑗 ) = 0, 𝑗 ≠ 𝑘
➢ Although the error-term assumptions in (2) are very modest,
✓ we shall later need to add the assumption of joint normality for
making confidence statements and testing hypotheses.
➢ In matrix notation, (1) becomes
𝑌1 1 𝑧11 𝑧12 ⋯ 𝑧1𝑟 𝛽0 𝜀1
𝑌 1 𝑧21 𝑧22 ⋯ 𝑧2𝑟 𝛽1 𝜀2
[ 2] = [ ][ ] + [ ⋮ ]
⋮ ⋮ ⋮ ⋮ ⋱ ⋮ ⋮
𝑌𝑛 1 𝑧𝑛1 𝑧𝑛2 ⋯ 𝑧𝑛𝑟 𝛽𝑟 𝜀𝑛
(observed response vector) (Designed matrix)
or 𝑌 = 𝑍 𝛽 + 𝜀 (3)
(𝑛 × 1) (𝑛 × (𝑟 + 1)) (𝑟 + 1) × 1) (𝑛 × 1)
and the specification in (2) become

2
1. 𝐸(𝜀) = 0; and
2. 𝐶𝑜𝑣(𝜀) = 𝐸(𝜀𝜀 ′ ) = 𝜎 2 𝐼

Example: Determine the linear regression model for fitting a straight line
Mean response = 𝐸(𝑌) = 𝛽0 + 𝛽1 𝑧1
to the data
𝑧1 0 1 2 3 4
y 1 4 3 8 9
➢ Before the responses 𝑌 = [𝑌1 , 𝑌2 , … … . . , 𝑌5 ] are observed, the errors 𝜀 ′ =
′

[𝜀1 , 𝜀2 , … … . . , 𝜀5 ] are random and we can write

𝑌 = 𝑍𝛽 + 𝜀
where
𝑌1 1 𝑧11 𝜀1
𝑌 1 𝑧21 𝛽 𝜀2
𝑌 = [ 2] , 𝑍=[ ], 𝛽 = [ 0 ], 𝜀 = [ ⋮ ]
⋮ ⋮ ⋮ 𝛽1
𝑌5 1 𝑧51 𝜀5
➢ The data for this model are contained in the observed response vector y and
the design matrix Z, where
1 1 0
4 1 1
𝑦= 3 , 𝑍= 1 2
8 1 3
[9 ] [1 4]
Least Squares Estimation
➢ One of the objectives of regression analysis is to
✓ develop an equation that will allow the investigator to predict the
response for given values of the predictor variables.
➢ Thus, it is necessary to
✓ "fit" the model in (3) to the observed 𝒚𝒋 corresponding to the known
values 𝟏, 𝒛𝒋𝟏 , 𝒛𝒋𝟐 , … … … … . , 𝒛𝒋𝒓 .
➢ That is, we must determine the values for regression coefficients 𝜷 and the
error variance 𝝈𝟐 consistent with the available data.

3
➢ Let b be trial values for 𝛽.
➢ Consider the difference 𝒚𝒋 − 𝒃𝟎 − 𝒃𝟏 𝒛𝒋𝟏 − ⋯ … … − 𝒃𝒓 𝒛𝒋𝒓 between the
observed response 𝒚𝒋 and the value 𝒃𝟎 + 𝒃𝟏 𝒛𝒋𝟏 + ⋯ + 𝒃𝒓 𝒛𝒋𝒓 that would be
expected if b were the "true” parameter vector.
➢ Typically, the differences 𝒚𝒋 − 𝒃𝟎 − 𝒃𝟏 𝒛𝒋𝟏 − ⋯ − 𝒃𝒓 𝒛𝒋𝒓 will not be zero,
because the response fluctuates about its expected value.
➢ The method of least squares selects b so as to minimize the sum of the
squares of the differences:
𝑛
2
𝑆(𝑏) = ∑(𝑦𝑗 − 𝑏0 − 𝑏1 𝑧𝑗1 − ⋯ … … − 𝑏𝑟 𝑧𝑗𝑟 )
𝑗=0

= (𝑦 − 𝑍𝑏)′ (𝑦 − 𝑍𝑏) (4)

➢ The coefficients b chosen by the least squares criterion are called least
squares estimates of the regression parameters 𝛽.
➢ They will henceforth be denoted by 𝜷 ̂ to emphasize their role as estimates of
𝛽.
̂ are
➢ The coefficients 𝜷
✓ consistent with the data in the sense that they produce estimated
(fitted) mean responses, 𝛽̂0 + 𝛽̂1 𝑧𝑗1 + ⋯ … … . +𝛽̂𝑟 𝑧𝑗𝑟 ,
✓ the sum of whose squares of the differences from the observed 𝑦𝑗 is
as small as possible.

➢ The deviations
𝜀̂𝑗 = 𝑦𝑗 − 𝛽̂0 − 𝛽̂1 𝑧𝑗1 − ⋯ … … . −𝛽̂𝑟 𝑧𝑗𝑟 , 𝑗 = 1, 2, … … , 𝑛 (5)
are called residuals.
➢ The vector of residuals 𝜀̂ = 𝑦 − 𝑍𝛽̂ contains the information about the
remaining unknown parameter 𝝈𝟐 .

➢ Let Z have full rank 𝑟 + 1 ≤ 𝑛.

A matrix is said to have full rank if its rank is either equal to
its number of columns or to its number of rows (or to both).
➢ The least squares estimate of 𝜷 in 3 is given by

4
𝛽̂ = (𝑍 ′ 𝑍)−1 𝑍 ′ 𝑦

➢ Let 𝑦̂ = 𝑍𝛽̂ = 𝐻𝑦 denote the fitted values of y,

✓ where 𝐻 = 𝑍(𝑍 ′ 𝑍)−1 𝑍 ′ is called “hat” matrix.
➢ Then the residuals
𝜀̂ = 𝑦 − 𝑦̂ = [𝑦 − 𝑍(𝑍 ′ 𝑍)−1 𝑍 ′ 𝑦][𝐼 − 𝑍(𝑍 ′ 𝑍)−1 𝑍 ′ ]𝑦 = (𝐼 − 𝐻)𝑦

satisfy 𝑍 ′ 𝜀̂ = 0 and 𝑦̂ ′ 𝜀̂ = 0.

Also, the
2
𝑟𝑒𝑠𝑖𝑑𝑢𝑎𝑙 𝑠𝑢𝑚 𝑜𝑓 𝑠𝑞𝑢𝑎𝑟𝑒𝑠 = ∑𝑛𝑗=1(𝑦𝑗 − 𝛽̂0 − 𝛽̂1 𝑧𝑗1 − ⋯ − 𝛽̂𝑟 𝑧𝑗𝑟 )
= 𝜀̂ ′ 𝜀̂
= 𝑦 ′ 𝑦 − 𝑦 ′ 𝑍𝛽̂
Example: Calculate the least square estimates 𝛽̂ , the residuals 𝜀̂, and the residual
sum of squares for a linear model
𝑌𝑗 = 𝛽0 + 𝛽1 𝑧𝑗1 + 𝜀𝑗
𝑓𝑖𝑡 𝑡𝑜 𝑡ℎ𝑒 𝑑𝑎𝑡𝑎
𝑧1 0 1 2 3 4
y 1 4 3 8 9

Solution: We have
Z 𝑍′ 𝑦 𝑍′𝑍 (𝑍 ′ 𝑍)−1 𝑍′𝑦

1 0 1
1 1 4
1 1 1 1 1 5 10 0.6 −0.2 25
1 2 [ ] 3 [ ] [ ] [ ]
0 1 2 3 4 10 30 −0.2 0.1 70
1 3 8
[1 4] [9 ]

5
Calculation of (𝒁′ 𝒁)−𝟏
5 10
𝑍′𝑍 = [ ]
10 30
|𝑍 ′ 𝑍| = 150 − 100 = 50
Cofactor of 5 = (-1)2(30)=30
Cofactor of 10 = (-1)3 (10)= -10
Cofactor of 10 = (-1)3 (10)= -10
Cofactor of 30 = (-1)4 (5)= 5
30 −10
Cofactor matrix of 𝑍 ′ 𝑍 = [ ]
−10 5
30 −10
Adj 𝑍 ′ 𝑍 = 𝑇𝑟𝑎𝑛𝑝𝑜𝑠𝑒 𝑜𝑓 𝑐𝑜𝑓𝑎𝑐𝑡𝑜𝑟 𝑚𝑎𝑡𝑟𝑖𝑥 𝑍 ′ 𝑍 = [ ]
−10 5
1 1 30 −10 0.6 −0.2
(𝑍 ′ 𝑍)−1 = ′ Adj 𝑍 ′ 𝑍 = [ ]=[ ]
|𝑍 𝑍| 50 −10 5 −0.2 0.1

Consequently,
𝛽̂ 0.6 −0.2 25 1
𝛽̂ = [ 0 ] = (𝑍 ′ 𝑍)−1 𝑍 ′ 𝑦 = [ ][ ] = [ ]
̂
𝛽1 −0.2 0.1 70 2
and the fitted equation is
𝑦̂ = 𝛽̂0 + 𝛽̂1 𝑧 = 1 + 2𝑧
The vector of fitted (predicted) value is
1 0 1
1 1 3
1
𝑦̂ = 𝑍𝛽̂ = 1 2 [ ] = 5
2
1 3 7
[1 4 ] [ 9]

6
1 1 0
4 3 1
so 𝜀̂ = 𝑦 − 𝑦̂ = 3 − 5 = −2
8 7 1
[ 9 ] [9 ] [ 0 ]

The residual sum of squares is

0
1
𝜀̂ ′ 𝜀̂ = [0 1 −2 1 0] −2 = 02 + 12 + (−2)2 + 12 + 02 = 6
1
[0]
The Quality of the Model Assessment
➢ The quality of the models fit can be measured by the coefficient of
determination
2
∑𝑛 2
𝑗=1 𝑒̂𝑗 ∑𝑛 ̂ 𝑗 −𝑦̅)
𝑗=1(𝑦
2
𝑅 =1− 2 = 2
∑𝑛 ̅)
𝑗=1(𝑦𝑗 −𝑦 ∑𝑛 ̅)
𝑗=1(𝑦𝑗 −𝑦

➢ The quantity 𝑅2 gives the proportion of the total variation in the 𝑦𝑗 ’s

"explained" by the predictor variables 𝑧1 , 𝑧2 , … … … , 𝑧𝑟 .
➢ Here 𝑅2 (or the multiple correlation coefficient 𝑅 = +√𝑅 2 ) equals 1 if
✓ the fitted equation passes through all the data points; so that 𝑒̂𝑗 = 0 for
all j.
➢ At the other extreme, 𝑅2 is 0 if
✓ 𝛽̂0 = 𝑦̅ and 𝛽̂1 = 𝛽̂2 = ⋯ … . = 𝛽̂𝑟 = 0.
✓ In this case, the predictor variables 𝑧1 , 𝑧2 , … … … , 𝑧𝑟 have no influence
on the response.

7
Example: Find 𝑅2 for the above problem.
y z 𝑦̂ = 1 + 2𝑧 (𝑦̂ − 𝑦̅)2 (𝑦 − 𝑦̅)2
1 0 1 16 16
4 1 3 4 1
3 2 5 0 4
8 3 7 4 9
9 4 9 16 16
5 𝑛 𝑛
2 2
∑ 𝑦 = 25 ∑(𝑦̂𝑗 − 𝑦̅) = 40 ∑(𝑦𝑗 − 𝑦̅) = 46
𝑗=1 𝑗=1 𝑗=1
25
𝑦̅ = =5
5
2
∑𝑛𝑗=1(𝑦̂𝑗 − 𝑦̅) 40
𝑅2 = 2 = = 0.87
∑𝑛𝑗=1(𝑦𝑗 − 𝑦̅) 46

So, 87% of the variation of y is explained by the predictor variable z.

Adjusted 𝑹𝟐
➢ Incidentally, 𝑅2 is biased upward, particularly in small samples. Therefore,
adjusted 𝑅2 is sometimes used. The formula is

2
(𝑁 − 1)(1 − 𝑅2
𝐴𝑑𝑗𝑢𝑠𝑡𝑒𝑑 𝑅 = 1 − ( )
𝑁−𝐾−1
where

✓ N represents the number of data points in our dataset

✓ K represents the number of independent variables, and
✓ R2 represents the coefficient of determination

➢ Thus, adjusted 𝑅2 for our example

(𝑁−1)(1−𝑅 2 (5−1)(1−.87
𝐴𝑑𝑗𝑢𝑠𝑡𝑒𝑑 𝑅2 = 1 − ( ) = 1 − ( 5−1−1 )
𝑁−𝐾−1
4 × .13
=1− = 1 − 0.104 = 0. .89
5

8
➢ Note that, unlike 𝑅2 , adjusted 𝑅 2 can actually get smaller as additional
variables are added to the model.
➢ One of the claimed benefits for adjusted 𝑅2 is that it “punishes” you for
including extraneous variables in the model.
✓ Extraneous means irrelevant or unrelated.
✓ These variables can influence the dependent variable but are beyond
the researchers' control, and sometimes even their awareness.
✓ They make it difficult to determine the actual impact of the
independent (intentionally manipulated) variable.
✓ If left uncontrolled, extraneous variables can lead to inaccurate
conclusions about the relationship between independent and
dependent variables.
➢ Also note that, as N gets larger, the difference between 𝑅2 and adjusted 𝑅2
gets smaller and smaller.

What is the difference between R square and adjusted R square?

➢ R square and adjusted R square values are used for model validation in case
of linear regression.
✓ R square indicates the variation of all the independent variables on
the dependent variable. i.e. it considers all the independent variable to
explain the variation.
✓ In the case of Adjusted R squared, it considers only significant
variables (p values less than 0.05) to indicate the percentage of
variation in the model.

MULTIPLE REGRESSION ANALYSIS: THE PROBLEM OF INFERENCE

➢ If our sole objective is point estimation of the parameters of the regression
models,
✓ the method of ordinary least squares (OLS), which does not make
any assumption about the probability distribution of the disturbances
𝜀𝑖 , will be sufficient.
➢ The OLS estimators of the regression coefficients are best linear unbiased
estimators (BLUE).
➢ But if our objective is estimation as well as hypothesis testing,

9
✓ then we need to assume that the 𝜺𝒊 follow some probability
distribution.
➢ For reasons we assumed that the 𝜺𝒊 follow the normal distribution with
zero mean and constant variance 𝝈𝟐 .
➢ Moreover, the estimators 𝛽̂0 , 𝛽̂1 , 𝑎𝑛𝑑 𝛽̂2 are themselves normally distributed
with means equal to true 𝛽0 , 𝛽1 , 𝑎𝑛𝑑 𝛽2 and with the variances

1 𝑋̅12 ∑ 𝑥2𝑖
2
+ 𝑋̅22 ∑ 𝑥1𝑖
2
− 2𝑋̅1 𝑋̅2 ∑ 𝑥1𝑖 𝑥2𝑖 2
̂
𝑣𝑎𝑟(𝛽0 ) = [ + ]𝜎
2 ∑ 2
𝑛 ∑ 𝑥1𝑖 𝑥2𝑖 − (∑ 𝑥1𝑖 𝑥2𝑖 )2

𝑠𝑒(𝛽̂0 ) = +√𝑣𝑎𝑟(𝛽̂0 )

2
∑ 𝑥2𝑖
𝑣𝑎𝑟(𝛽̂1 ) = [ 2 2 2
] 𝜎2
(∑ 𝑥1𝑖 )(∑ 𝑥2𝑖 ) − (∑ 𝑥1𝑖 𝑥2𝑖 )
Or, equivalently,
𝑠𝑒(𝛽̂1 ) = +√𝑣𝑎𝑟(𝛽̂1 )
2
∑ 𝑥1𝑖
𝑣𝑎𝑟(𝛽̂2 ) = [ 2 2 ] 𝜎2
(∑ 𝑥1𝑖 )(∑ 𝑥2𝑖 ) − (∑ 𝑥1𝑖 𝑥2𝑖 )2
Or, equivalently,

𝑠𝑒(𝛽̂2 ) = +√𝑣𝑎𝑟(𝛽̂2 )

10
Example
• Husbands’ hours of housework per week (Y)
• Number of children (X1)
• Husbands’ years of education (X2)

Table: 1
2 2
𝑦𝑖 𝑥1𝑖 𝑥2𝑖 𝑥1𝑖 𝑥2𝑖 𝑥1𝑖 𝑥2𝑖
1 1 12 1 144 12
2 1 14 1 196 14
3 1 16 1 256 16
5 1 16 1 256 16
3 2 18 4 324 36
1 2 16 4 256 32
5 3 12 9 144 36
0 3 12 9 144 36
6 4 10 16 100 40
3 4 12 16 144 48
7 5 12 25 144 60
4 5 16 25 256 80
2 2
∑ 𝑥1𝑖 ∑ 𝑥2𝑖 = ∑ 𝑥1𝑖 ∑ 𝑥2𝑖 ∑ 𝑥1𝑖 𝑥2𝑖
∑ 𝑦𝑖 =40 = 32 166 = 112 = 2108 = 426

SPSS Results

Change Statistics

R Adjusted R Std. Error of R Square F

R Square Square the Estimate Change Change df1 df2 Sig. F Change

.499a .249 .082 2.05649 .249 1.490 2 9 .276

ANOVAa

Model Sum of Squares df Mean Square F Sig.

1 Regression 12.604 2 6.302 1.490 .276b

Residual 38.062 9 4.229

Total 50.667 11

a. Dependent Variable: Husbands’ hours of housework per week

11
b. Predictors: (Constant), Husbands’ years of education , Number of children

Coefficientsa

Standardized
Unstandardized Coefficients Coefficients

Model B Std. Error Beta t Sig.

1 (Constant) 1.466 4.385 .334 .746

Number of children .689 .433 .500 1.591 .146

Husbands’ years of
.002 .272 .003 .008 .994
education

a. Dependent Variable: Husbands’ hours of housework per week

32 166
𝑋̅1 = = 2.67, 𝑋̅2 = = 13.83
12 12
From Table 2:
𝜎̂ 2 = 𝑀𝑒𝑎𝑛 𝑆𝑢𝑚 𝑠𝑞𝑢𝑎𝑟𝑒 𝑜𝑓 𝑒𝑟𝑟𝑜𝑟 = 4.29
1 𝑋̅12 ∑ 𝑥2𝑖
2
+ 𝑋̅22 ∑ 𝑥1𝑖
2
− 2𝑋̅1 𝑋̅2 ∑ 𝑥1𝑖 𝑥2𝑖 2
̂
𝑣𝑎𝑟(𝛽0 ) = [ + ]𝜎
2 ∑ 2
𝑛 ∑ 𝑥1𝑖 𝑥2𝑖 − (∑ 𝑥1𝑖 𝑥2𝑖 )2
1 (2.67)2 (2108)+(13.83)2 (112)−2(2.67)(13.83)(426)
=[ + (112)(2108)−(426)2
] 4.229
12
1 7.13(2108)+(191.27)(112)−31461.04
=[ + ] 4.229
12 236096−181476
15030.04+21422.24−31461.04
= [0.083 + ] 4.29
54620
36452.28−31461.04
= [0.083 + ] 4.29
5420
= [0.083 + 0.92]4.29
= 4.30
𝑠𝑒(𝛽̂0 ) = +√𝑣𝑎𝑟(𝛽̂0 ) = +√4.30 = 2.07

2
∑ 𝑥2𝑖
𝑣𝑎𝑟(𝛽̂1 ) = [ 2 2 2
] 𝜎2
(∑ 𝑥1𝑖 )(∑ 𝑥2𝑖 ) − (∑ 𝑥1𝑖 𝑥2𝑖 )
2108
𝑣𝑎𝑟(𝛽̂1 ) = [ ] 4.229
(112)(2108) − (426)2
2108
=[ ] 4.229
236096−181476

12
2108
=[ ] 4.229 = .0385 × 4.229 = 0.163
54620

𝑠𝑒(𝛽̂1 ) = +√𝑣𝑎𝑟(𝛽̂1 ) = +√. 163 = 0.404

2
∑ 𝑥1𝑖
𝑣𝑎𝑟(𝛽̂2 ) = [ 2 2 2
] 𝜎2
(∑ 𝑥1𝑖 )(∑ 𝑥2𝑖 ) − (∑ 𝑥1𝑖 𝑥2𝑖 )

112
𝑣𝑎𝑟(𝛽̂2 ) = [ ] 4.229
(112)(2108) − (426)2
112
=[ ] 4.229
236096−181476
112
=[ ] 4.229 = 0.0087
54620

𝑠𝑒(𝛽̂2 ) = +√𝑣𝑎𝑟(𝛽̂2 ) = +√. 0087 = 0.093

Significance Testing
➢ Significance testing involves testing the significance of the overall
regression equation as well as specific regression coefficients.
➢ The null hypothesis for the overall test is that the coefficient of multiple
2
determination in the population, 𝑅𝑝𝑜𝑝 , is zero.
2
𝐻0 : 𝑅𝑝𝑜𝑝 =0
2
𝐻𝐴 : 𝑅𝑝𝑜𝑝 ≠0
➢ This is equivalent to the following null hypothesis

𝐻0 : 𝛽2 = ⋯ … … … . . = 𝛽𝐾 = 0
𝐻𝐴 : 𝐴𝑡 𝑙𝑒𝑎𝑠𝑡 𝑜𝑛𝑒 𝛽 ≠ 0

➢ One of the formulae for F, which is mainly useful when the original data are
not available is
𝑅2 (𝑁 − 𝐾 − 1)
𝐹=
(1 − 𝑅2 )𝐾
with F (K, N-K-1) distribution.

13
➢ For the data of above example
𝑅2 (𝑁 − 𝐾 − 1) . 249(12 − 2 − 1) 2.241
𝐹= = = = 1.49
(1 − 𝑅2 )𝐾 (1 − .249)2 1.502
➢ The critical value of F with K=2 and N-K-1=12 – 2 -1 = 9 df is 4.26 (using
F distribution Table).

Comment: The critical value of F=4.29 is greater than the calculated value of F
=1.49 at 5% level of significance, providing strong evidence in favor of null
hypothesis.

If the overall null hypothesis is rejected, one or more population regression

coefficients have a value different from 0. To determine which specific coefficients
(𝛽𝑖 ′𝑠) are nonzero, additional tests are necessary. The test statistic for this hypothesis
is
𝛽̂𝑖
𝑡=
𝑠𝑒(𝛽̂𝑖 )
which has t distribution with n – k – 1 degrees of freedom.

Example: Do a t-test to determine whether 𝛽1 is significantly different from 0

Solution:
Step 1: 𝐻0 : 𝛽1 = 0
𝐻1 : 𝛽1 ≠ 0
Step 2: Appropriate test statistic is
̂1
𝛽
𝑡= ̂1 )
𝑠𝑒(𝛽
with n – k – 1 degrees of freedom
➢ In this case n = 12, k = 2, 𝛽̂1 = .636, 𝑠𝑒(𝛽̂1 ) = .404, df =9, 𝛼 = .05, 𝑡𝛼/2 =
2.201
Step 3: For 𝛽̂1 , the computed value of the test statistic is
̂1
𝛽 .636
𝑡= ̂1 ) = = 1.56
𝑠𝑒(𝛽 .404

14
Step 4: Accept 𝐻0 .

Example: Do a t-test to determine whether 𝛽2 is significantly different from 0

Step 1: 𝐻0 : 𝛽2 = 0
𝐻1 : 𝛽2 ≠ 0
Step 2: Appropriate test statistic is
̂2
𝛽
𝑡= ̂2 )
𝑠𝑒(𝛽
with n – k – 1 degrees of freedom
➢ In this case n = 12, k = 2, 𝛽̂2 = −𝟎. 𝟎𝟔𝟓, 𝑠𝑒(𝛽̂2 ) = 0.094, df =9, 𝛼 = .05,
𝑡𝛼/2 = 2.201

Step 3: For 𝛽̂2 , the computed value of the test statistic is

𝛽̂2 −.065
𝑡= = = −0.6915
𝑠𝑒(𝛽̂2 ) . 094
Step 4: Accept 𝐻0 .

Multiple Regression
No ratings yet
Multiple Regression
22 pages
Multiple Linear Reegression
No ratings yet
Multiple Linear Reegression
21 pages
Multiple Linear Regression
No ratings yet
Multiple Linear Regression
18 pages
Mungadze Linear
No ratings yet
Mungadze Linear
21 pages
Education and Research: UP School of Statistics Student Council
No ratings yet
Education and Research: UP School of Statistics Student Council
26 pages
Stat 353 Study Guide
No ratings yet
Stat 353 Study Guide
44 pages
4 - Multiple Linear Regressions
No ratings yet
4 - Multiple Linear Regressions
61 pages
Intro to Simple Linear Regression
No ratings yet
Intro to Simple Linear Regression
11 pages
Lab 8
No ratings yet
Lab 8
13 pages
Parameter Estimation
No ratings yet
Parameter Estimation
19 pages
FCDS - RA ch1 Sp21
No ratings yet
FCDS - RA ch1 Sp21
14 pages
Unit4 Multivariate Analysis
No ratings yet
Unit4 Multivariate Analysis
20 pages
WST 311 Notes Part 2 2024
No ratings yet
WST 311 Notes Part 2 2024
21 pages
Stephen and Senthamarai Kannan (2017) - Detection of Outliers in Regression Model For Medical Data
No ratings yet
Stephen and Senthamarai Kannan (2017) - Detection of Outliers in Regression Model For Medical Data
7 pages
CH 2
No ratings yet
CH 2
31 pages
Linear Regression
No ratings yet
Linear Regression
47 pages
Chap 7
No ratings yet
Chap 7
7 pages
007 Multivariate Linear Regression
No ratings yet
007 Multivariate Linear Regression
45 pages
Multivariate Linear Regression Guide
100% (1)
Multivariate Linear Regression Guide
46 pages
Multi Variate Regression
No ratings yet
Multi Variate Regression
52 pages
Multiple Linear Regression
No ratings yet
Multiple Linear Regression
17 pages
Lecture 3 Multivariate Multiple Regression
No ratings yet
Lecture 3 Multivariate Multiple Regression
6 pages
Classical Linear Regression and Its Assumptions
No ratings yet
Classical Linear Regression and Its Assumptions
63 pages
Paper On Polynomial Regression
No ratings yet
Paper On Polynomial Regression
7 pages
Regression Modelling and Least-Squares: GSA Short Course: Session 1 Regression
No ratings yet
Regression Modelling and Least-Squares: GSA Short Course: Session 1 Regression
6 pages
3.1 Multiple Linear Regression Model
No ratings yet
3.1 Multiple Linear Regression Model
11 pages
Multiple Regression Model - 03
No ratings yet
Multiple Regression Model - 03
27 pages
Topic 6B Regression
No ratings yet
Topic 6B Regression
13 pages
FCDS - RA ch4 Sp21
No ratings yet
FCDS - RA ch4 Sp21
18 pages
Multiple Linear Regression Model by Jeevan Bista
No ratings yet
Multiple Linear Regression Model by Jeevan Bista
16 pages
Linear Regression for Researchers
No ratings yet
Linear Regression for Researchers
41 pages
Multiple Linear Regression Matrix
No ratings yet
Multiple Linear Regression Matrix
22 pages
Regression Analysis Guide
100% (1)
Regression Analysis Guide
280 pages
Simple Linear Regression Guide
No ratings yet
Simple Linear Regression Guide
14 pages
Raw Introduction to Linear Regression (서울대 회귀분석 강의노트)
No ratings yet
Raw Introduction to Linear Regression (서울대 회귀분석 강의노트)
226 pages
Multiple Linear Regression
100% (1)
Multiple Linear Regression
14 pages
Econometric Theory: Module - Iii
No ratings yet
Econometric Theory: Module - Iii
10 pages
Regression Analysis Essentials
No ratings yet
Regression Analysis Essentials
55 pages
BST 32202 Linear Regression 6 SLR Assumptions Lse
No ratings yet
BST 32202 Linear Regression 6 SLR Assumptions Lse
20 pages
REg 03
No ratings yet
REg 03
39 pages
Regression and Multiple Regression Analysis
100% (1)
Regression and Multiple Regression Analysis
21 pages
Regress
No ratings yet
Regress
11 pages
Chapter 3 Multivariate Linear Regression
No ratings yet
Chapter 3 Multivariate Linear Regression
16 pages
Linear Least Squared
No ratings yet
Linear Least Squared
23 pages
Matrix Model
No ratings yet
Matrix Model
6 pages
Basics of Regression Analysis
No ratings yet
Basics of Regression Analysis
63 pages
Regression Analysis
No ratings yet
Regression Analysis
7 pages
Notes On Applied Linear Regression
No ratings yet
Notes On Applied Linear Regression
47 pages
Regression Notes - Part-1
No ratings yet
Regression Notes - Part-1
17 pages
Solutions Manual For Econometric Analysis 7th Edition by Greene Sample Chapter
No ratings yet
Solutions Manual For Econometric Analysis 7th Edition by Greene Sample Chapter
13 pages
PE Civil: Transportation Ebook Practice Exam
No ratings yet
PE Civil: Transportation Ebook Practice Exam
41 pages
Chapter 2
No ratings yet
Chapter 2
19 pages
07 Multiple Regression Analysis PDF
No ratings yet
07 Multiple Regression Analysis PDF
26 pages
MATH6183 Introduction+Regression
No ratings yet
MATH6183 Introduction+Regression
70 pages
f23 Econ103 Week2 Ta Note
No ratings yet
f23 Econ103 Week2 Ta Note
5 pages
10 - Regression - Explained - SPSS - Important For Basic Concept
No ratings yet
10 - Regression - Explained - SPSS - Important For Basic Concept
23 pages
EDA Lecture Module 2
100% (1)
EDA Lecture Module 2
42 pages
Train Delay Prediction with ML
No ratings yet
Train Delay Prediction with ML
12 pages
Pearsons R
No ratings yet
Pearsons R
8 pages
Efa 20-35
No ratings yet
Efa 20-35
78 pages
Correlation
No ratings yet
Correlation
8 pages
Mckinney Time Series
No ratings yet
Mckinney Time Series
29 pages
Faculty of Science FRM 9649: Time Series Analysis & Forecasting
No ratings yet
Faculty of Science FRM 9649: Time Series Analysis & Forecasting
5 pages
Group 11
No ratings yet
Group 11
17 pages
Forecasting - CocaColaSales
No ratings yet
Forecasting - CocaColaSales
6 pages
Dissertation Cox Regression
100% (2)
Dissertation Cox Regression
5 pages
Regression - What Is The Difference Between $/beta - 1$ and $/hat (/beta) - 1$? - Cross Validated
No ratings yet
Regression - What Is The Difference Between $/beta - 1$ and $/hat (/beta) - 1$? - Cross Validated
1 page
Logistic Regression Guide
No ratings yet
Logistic Regression Guide
19 pages
Pengaruh Penyajian Laporan Keuangan Dan Aksesibilitas TERHADAP TINGKAT AKUNTABILITAS KEU PADA SKPD KAB BENGKALIS
No ratings yet
Pengaruh Penyajian Laporan Keuangan Dan Aksesibilitas TERHADAP TINGKAT AKUNTABILITAS KEU PADA SKPD KAB BENGKALIS
7 pages
2nd Exam Question Paper 2
No ratings yet
2nd Exam Question Paper 2
16 pages
Descriptive Statistics: Mean Std. Deviation N Tobin's Q DPR TAG DER PER
No ratings yet
Descriptive Statistics: Mean Std. Deviation N Tobin's Q DPR TAG DER PER
8 pages
Stat 250 Gunderson Lecture Notes 11: Regression Analysis: Main Idea
No ratings yet
Stat 250 Gunderson Lecture Notes 11: Regression Analysis: Main Idea
22 pages
Research Methods Exam Guide
No ratings yet
Research Methods Exam Guide
2 pages
Week1 Lecture1
No ratings yet
Week1 Lecture1
65 pages
Statistical Tests: A Quick Guide
No ratings yet
Statistical Tests: A Quick Guide
7 pages
Linear Regression & Correlation Analysis
No ratings yet
Linear Regression & Correlation Analysis
10 pages
Exp No 2
No ratings yet
Exp No 2
5 pages
Deep Learning Chorale Prelude
No ratings yet
Deep Learning Chorale Prelude
6 pages
Statistics II - Formula Sheet: Unit 1
No ratings yet
Statistics II - Formula Sheet: Unit 1
2 pages
Article 20 Vol 7 4 2018 2
No ratings yet
Article 20 Vol 7 4 2018 2
16 pages
Chapter 0 - Multiple Regression Models
No ratings yet
Chapter 0 - Multiple Regression Models
34 pages
Curve Fitting Techniques Guide
No ratings yet
Curve Fitting Techniques Guide
4 pages
Econometrics Cha 4
No ratings yet
Econometrics Cha 4
72 pages
Stat Case 1 Century National Banks
No ratings yet
Stat Case 1 Century National Banks
28 pages
KTN FixedEffects
No ratings yet
KTN FixedEffects
10 pages

Lecture 2 Multivariate Linear Regression Models

Uploaded by

Lecture 2 Multivariate Linear Regression Models

Uploaded by

Multivariate Linear Regression Models

What is Regression Analysis?

The Classical Linear Regression Model

Y = current market value of home

➢ The classical linear regression model states that

[𝜀1 , 𝜀2 , … … . . , 𝜀5 ] are random and we can write

= (𝑦 − 𝑍𝑏)′ (𝑦 − 𝑍𝑏) (4)

➢ Let Z have full rank 𝑟 + 1 ≤ 𝑛.

➢ Let 𝑦̂ = 𝑍𝛽̂ = 𝐻𝑦 denote the fitted values of y,

The residual sum of squares is

➢ The quantity 𝑅2 gives the proportion of the total variation in the 𝑦𝑗 ’s

So, 87% of the variation of y is explained by the predictor variable z.

✓ N represents the number of data points in our dataset

➢ Thus, adjusted 𝑅2 for our example

What is the difference between R square and adjusted R square?

MULTIPLE REGRESSION ANALYSIS: THE PROBLEM OF INFERENCE

R Adjusted R Std. Error of R Square F

.499a .249 .082 2.05649 .249 1.490 2 9 .276

Model Sum of Squares df Mean Square F Sig.

1 Regression 12.604 2 6.302 1.490 .276b

Residual 38.062 9 4.229

a. Dependent Variable: Husbands’ hours of housework per week

Model B Std. Error Beta t Sig.

1 (Constant) 1.466 4.385 .334 .746

Number of children .689 .433 .500 1.591 .146

a. Dependent Variable: Husbands’ hours of housework per week

𝑠𝑒(𝛽̂1 ) = +√𝑣𝑎𝑟(𝛽̂1 ) = +√. 163 = 0.404

𝑠𝑒(𝛽̂2 ) = +√𝑣𝑎𝑟(𝛽̂2 ) = +√. 0087 = 0.093

If the overall null hypothesis is rejected, one or more population regression

Example: Do a t-test to determine whether 𝛽1 is significantly different from 0

Example: Do a t-test to determine whether 𝛽2 is significantly different from 0

Step 3: For 𝛽̂2 , the computed value of the test statistic is

You might also like