Multivariate Linear Regression Models
What is Regression Analysis?
➢ Regression analysis is the statistical methodology for predicting values of
one or more response (dependent) variables from a collection of predictor
(independent) variable values.
➢ It can also be used for assessing the effects of the predictor variables on the
response variable/variables.
➢ We first discuss the multiple regression model for the prediction of a single
response/dependent variable.
➢ This model is then generalized to handle the prediction of several
dependent variables.
➢ As the name implies, multivariate regression is a technique that estimates
a single regression model with more than one outcome/dependent
variable.
➢ When there is more than one predictor variable in a multivariate
regression model, the model is a multivariate multiple regression.
The Classical Linear Regression Model
➢ Let 𝑧1 , 𝑧2 , … … … … … . . , 𝑧𝑟 be predictor variables thought to be related to a
response variable Y. For example, with r = 4, we might have
Y = current market value of home
𝑧1 = square feet of living area
𝑧2 = location
𝑧3 = assessed value last year
𝑧4 = quality of construction (price per square foot)
➢ The classical linear regression model states that
✓ Y is composed of a mean, which depends in a continuous manner
on the 𝒛𝒊 ′𝒔, and
✓ a random error 𝜺, which
o accounts for measurement error and
o the effects of other variables not explicitly considered in the
model.
➢ The values of the predictor variables recorded from the experiment or set
by the investigator are treated as fixed.
➢ The error (and hence the response) is viewed as a random variable whose
behavior is characterized by a set of distributional assumptions.
1
➢ Specifically, the linear regression model with a single response takes the
form
𝑌 = 𝛽0 + 𝛽1 𝑧1 + 𝛽2 𝑧2 + ⋯ … … … … … . . +𝛽𝑟 𝑧𝑟 + 𝜀
[Response] = [mean (dependent on 𝑧1 , 𝑧2 , … … … … … . . , 𝑧𝑟 )] + [𝑒𝑟𝑟𝑜𝑟]
➢ The term "linear" refers to the fact that the mean is a linear function of
the unknown parameters 𝛽0 , 𝛽1 , … … … … . . , 𝛽𝑟 .
➢
➢ With n independent observations on Y and the associated values of 𝑧𝑖 ,
✓ the complete model becomes
𝑌1 = 𝛽0 + 𝛽1 𝑧11 + 𝛽2 𝑧12 + ⋯ … … … + 𝛽𝑟 𝑧1𝑟 + 𝜀1
𝑌2 = 𝛽0 + 𝛽1 𝑧21 + 𝛽2 𝑧22 + ⋯ … … … + 𝛽𝑟 𝑧2𝑟 + 𝜀2 (1)
⋮ ⋮ ⋮
𝑌𝑛 = 𝛽0 + 𝛽1 𝑧𝑛1 + 𝛽2 𝑧𝑛2 + ⋯ … … … + 𝛽𝑟 𝑧𝑛𝑟 + 𝜀𝑛
where the error terms are assumed to have the following properties:
i. 𝐸(𝜀𝑖 ) = 0;
ii. 𝑉𝑎𝑟(𝜀𝑖 ) = 𝜎 2 (𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡); 𝑎𝑛𝑑 (2)
iii. 𝐶𝑜𝑣(𝜀𝑖 , 𝜀𝑗 ) = 0, 𝑗 ≠ 𝑘
➢ Although the error-term assumptions in (2) are very modest,
✓ we shall later need to add the assumption of joint normality for
making confidence statements and testing hypotheses.
➢ In matrix notation, (1) becomes
𝑌1 1 𝑧11 𝑧12 ⋯ 𝑧1𝑟 𝛽0 𝜀1
𝑌 1 𝑧21 𝑧22 ⋯ 𝑧2𝑟 𝛽1 𝜀2
[ 2] = [ ][ ] + [ ⋮ ]
⋮ ⋮ ⋮ ⋮ ⋱ ⋮ ⋮
𝑌𝑛 1 𝑧𝑛1 𝑧𝑛2 ⋯ 𝑧𝑛𝑟 𝛽𝑟 𝜀𝑛
(observed response vector) (Designed matrix)
or 𝑌 = 𝑍 𝛽 + 𝜀 (3)
(𝑛 × 1) (𝑛 × (𝑟 + 1)) (𝑟 + 1) × 1) (𝑛 × 1)
and the specification in (2) become
2
1. 𝐸(𝜀) = 0; and
2. 𝐶𝑜𝑣(𝜀) = 𝐸(𝜀𝜀 ′ ) = 𝜎 2 𝐼
Example: Determine the linear regression model for fitting a straight line
Mean response = 𝐸(𝑌) = 𝛽0 + 𝛽1 𝑧1
to the data
𝑧1 0 1 2 3 4
y 1 4 3 8 9
➢ Before the responses 𝑌 = [𝑌1 , 𝑌2 , … … . . , 𝑌5 ] are observed, the errors 𝜀 ′ =
′
[𝜀1 , 𝜀2 , … … . . , 𝜀5 ] are random and we can write
𝑌 = 𝑍𝛽 + 𝜀
where
𝑌1 1 𝑧11 𝜀1
𝑌 1 𝑧21 𝛽 𝜀2
𝑌 = [ 2] , 𝑍=[ ], 𝛽 = [ 0 ], 𝜀 = [ ⋮ ]
⋮ ⋮ ⋮ 𝛽1
𝑌5 1 𝑧51 𝜀5
➢ The data for this model are contained in the observed response vector y and
the design matrix Z, where
1 1 0
4 1 1
𝑦= 3 , 𝑍= 1 2
8 1 3
[9 ] [1 4]
Least Squares Estimation
➢ One of the objectives of regression analysis is to
✓ develop an equation that will allow the investigator to predict the
response for given values of the predictor variables.
➢ Thus, it is necessary to
✓ "fit" the model in (3) to the observed 𝒚𝒋 corresponding to the known
values 𝟏, 𝒛𝒋𝟏 , 𝒛𝒋𝟐 , … … … … . , 𝒛𝒋𝒓 .
➢ That is, we must determine the values for regression coefficients 𝜷 and the
error variance 𝝈𝟐 consistent with the available data.
3
➢ Let b be trial values for 𝛽.
➢ Consider the difference 𝒚𝒋 − 𝒃𝟎 − 𝒃𝟏 𝒛𝒋𝟏 − ⋯ … … − 𝒃𝒓 𝒛𝒋𝒓 between the
observed response 𝒚𝒋 and the value 𝒃𝟎 + 𝒃𝟏 𝒛𝒋𝟏 + ⋯ + 𝒃𝒓 𝒛𝒋𝒓 that would be
expected if b were the "true” parameter vector.
➢ Typically, the differences 𝒚𝒋 − 𝒃𝟎 − 𝒃𝟏 𝒛𝒋𝟏 − ⋯ − 𝒃𝒓 𝒛𝒋𝒓 will not be zero,
because the response fluctuates about its expected value.
➢ The method of least squares selects b so as to minimize the sum of the
squares of the differences:
𝑛
2
𝑆(𝑏) = ∑(𝑦𝑗 − 𝑏0 − 𝑏1 𝑧𝑗1 − ⋯ … … − 𝑏𝑟 𝑧𝑗𝑟 )
𝑗=0
= (𝑦 − 𝑍𝑏)′ (𝑦 − 𝑍𝑏) (4)
➢ The coefficients b chosen by the least squares criterion are called least
squares estimates of the regression parameters 𝛽.
➢ They will henceforth be denoted by 𝜷 ̂ to emphasize their role as estimates of
𝛽.
̂ are
➢ The coefficients 𝜷
✓ consistent with the data in the sense that they produce estimated
(fitted) mean responses, 𝛽̂0 + 𝛽̂1 𝑧𝑗1 + ⋯ … … . +𝛽̂𝑟 𝑧𝑗𝑟 ,
✓ the sum of whose squares of the differences from the observed 𝑦𝑗 is
as small as possible.
➢ The deviations
𝜀̂𝑗 = 𝑦𝑗 − 𝛽̂0 − 𝛽̂1 𝑧𝑗1 − ⋯ … … . −𝛽̂𝑟 𝑧𝑗𝑟 , 𝑗 = 1, 2, … … , 𝑛 (5)
are called residuals.
➢ The vector of residuals 𝜀̂ = 𝑦 − 𝑍𝛽̂ contains the information about the
remaining unknown parameter 𝝈𝟐 .
➢ Let Z have full rank 𝑟 + 1 ≤ 𝑛.
A matrix is said to have full rank if its rank is either equal to
its number of columns or to its number of rows (or to both).
➢ The least squares estimate of 𝜷 in 3 is given by
4
𝛽̂ = (𝑍 ′ 𝑍)−1 𝑍 ′ 𝑦
➢ Let 𝑦̂ = 𝑍𝛽̂ = 𝐻𝑦 denote the fitted values of y,
✓ where 𝐻 = 𝑍(𝑍 ′ 𝑍)−1 𝑍 ′ is called “hat” matrix.
➢ Then the residuals
𝜀̂ = 𝑦 − 𝑦̂ = [𝑦 − 𝑍(𝑍 ′ 𝑍)−1 𝑍 ′ 𝑦][𝐼 − 𝑍(𝑍 ′ 𝑍)−1 𝑍 ′ ]𝑦 = (𝐼 − 𝐻)𝑦
satisfy 𝑍 ′ 𝜀̂ = 0 and 𝑦̂ ′ 𝜀̂ = 0.
Also, the
2
𝑟𝑒𝑠𝑖𝑑𝑢𝑎𝑙 𝑠𝑢𝑚 𝑜𝑓 𝑠𝑞𝑢𝑎𝑟𝑒𝑠 = ∑𝑛𝑗=1(𝑦𝑗 − 𝛽̂0 − 𝛽̂1 𝑧𝑗1 − ⋯ − 𝛽̂𝑟 𝑧𝑗𝑟 )
= 𝜀̂ ′ 𝜀̂
= 𝑦 ′ 𝑦 − 𝑦 ′ 𝑍𝛽̂
Example: Calculate the least square estimates 𝛽̂ , the residuals 𝜀̂, and the residual
sum of squares for a linear model
𝑌𝑗 = 𝛽0 + 𝛽1 𝑧𝑗1 + 𝜀𝑗
𝑓𝑖𝑡 𝑡𝑜 𝑡ℎ𝑒 𝑑𝑎𝑡𝑎
𝑧1 0 1 2 3 4
y 1 4 3 8 9
Solution: We have
Z 𝑍′ 𝑦 𝑍′𝑍 (𝑍 ′ 𝑍)−1 𝑍′𝑦
1 0 1
1 1 4
1 1 1 1 1 5 10 0.6 −0.2 25
1 2 [ ] 3 [ ] [ ] [ ]
0 1 2 3 4 10 30 −0.2 0.1 70
1 3 8
[1 4] [9 ]
5
Calculation of (𝒁′ 𝒁)−𝟏
5 10
𝑍′𝑍 = [ ]
10 30
|𝑍 ′ 𝑍| = 150 − 100 = 50
Cofactor of 5 = (-1)2(30)=30
Cofactor of 10 = (-1)3 (10)= -10
Cofactor of 10 = (-1)3 (10)= -10
Cofactor of 30 = (-1)4 (5)= 5
30 −10
Cofactor matrix of 𝑍 ′ 𝑍 = [ ]
−10 5
30 −10
Adj 𝑍 ′ 𝑍 = 𝑇𝑟𝑎𝑛𝑝𝑜𝑠𝑒 𝑜𝑓 𝑐𝑜𝑓𝑎𝑐𝑡𝑜𝑟 𝑚𝑎𝑡𝑟𝑖𝑥 𝑍 ′ 𝑍 = [ ]
−10 5
1 1 30 −10 0.6 −0.2
(𝑍 ′ 𝑍)−1 = ′ Adj 𝑍 ′ 𝑍 = [ ]=[ ]
|𝑍 𝑍| 50 −10 5 −0.2 0.1
Consequently,
𝛽̂ 0.6 −0.2 25 1
𝛽̂ = [ 0 ] = (𝑍 ′ 𝑍)−1 𝑍 ′ 𝑦 = [ ][ ] = [ ]
̂
𝛽1 −0.2 0.1 70 2
and the fitted equation is
𝑦̂ = 𝛽̂0 + 𝛽̂1 𝑧 = 1 + 2𝑧
The vector of fitted (predicted) value is
1 0 1
1 1 3
1
𝑦̂ = 𝑍𝛽̂ = 1 2 [ ] = 5
2
1 3 7
[1 4 ] [ 9]
6
1 1 0
4 3 1
so 𝜀̂ = 𝑦 − 𝑦̂ = 3 − 5 = −2
8 7 1
[ 9 ] [9 ] [ 0 ]
The residual sum of squares is
0
1
𝜀̂ ′ 𝜀̂ = [0 1 −2 1 0] −2 = 02 + 12 + (−2)2 + 12 + 02 = 6
1
[0]
The Quality of the Model Assessment
➢ The quality of the models fit can be measured by the coefficient of
determination
2
∑𝑛 2
𝑗=1 𝑒̂𝑗 ∑𝑛 ̂ 𝑗 −𝑦̅)
𝑗=1(𝑦
2
𝑅 =1− 2 = 2
∑𝑛 ̅)
𝑗=1(𝑦𝑗 −𝑦 ∑𝑛 ̅)
𝑗=1(𝑦𝑗 −𝑦
➢ The quantity 𝑅2 gives the proportion of the total variation in the 𝑦𝑗 ’s
"explained" by the predictor variables 𝑧1 , 𝑧2 , … … … , 𝑧𝑟 .
➢ Here 𝑅2 (or the multiple correlation coefficient 𝑅 = +√𝑅 2 ) equals 1 if
✓ the fitted equation passes through all the data points; so that 𝑒̂𝑗 = 0 for
all j.
➢ At the other extreme, 𝑅2 is 0 if
✓ 𝛽̂0 = 𝑦̅ and 𝛽̂1 = 𝛽̂2 = ⋯ … . = 𝛽̂𝑟 = 0.
✓ In this case, the predictor variables 𝑧1 , 𝑧2 , … … … , 𝑧𝑟 have no influence
on the response.
7
Example: Find 𝑅2 for the above problem.
y z 𝑦̂ = 1 + 2𝑧 (𝑦̂ − 𝑦̅)2 (𝑦 − 𝑦̅)2
1 0 1 16 16
4 1 3 4 1
3 2 5 0 4
8 3 7 4 9
9 4 9 16 16
5 𝑛 𝑛
2 2
∑ 𝑦 = 25 ∑(𝑦̂𝑗 − 𝑦̅) = 40 ∑(𝑦𝑗 − 𝑦̅) = 46
𝑗=1 𝑗=1 𝑗=1
25
𝑦̅ = =5
5
2
∑𝑛𝑗=1(𝑦̂𝑗 − 𝑦̅) 40
𝑅2 = 2 = = 0.87
∑𝑛𝑗=1(𝑦𝑗 − 𝑦̅) 46
So, 87% of the variation of y is explained by the predictor variable z.
Adjusted 𝑹𝟐
➢ Incidentally, 𝑅2 is biased upward, particularly in small samples. Therefore,
adjusted 𝑅2 is sometimes used. The formula is
2
(𝑁 − 1)(1 − 𝑅2
𝐴𝑑𝑗𝑢𝑠𝑡𝑒𝑑 𝑅 = 1 − ( )
𝑁−𝐾−1
where
✓ N represents the number of data points in our dataset
✓ K represents the number of independent variables, and
✓ R2 represents the coefficient of determination
➢ Thus, adjusted 𝑅2 for our example
(𝑁−1)(1−𝑅 2 (5−1)(1−.87
𝐴𝑑𝑗𝑢𝑠𝑡𝑒𝑑 𝑅2 = 1 − ( ) = 1 − ( 5−1−1 )
𝑁−𝐾−1
4 × .13
=1− = 1 − 0.104 = 0. .89
5
8
➢ Note that, unlike 𝑅2 , adjusted 𝑅 2 can actually get smaller as additional
variables are added to the model.
➢ One of the claimed benefits for adjusted 𝑅2 is that it “punishes” you for
including extraneous variables in the model.
✓ Extraneous means irrelevant or unrelated.
✓ These variables can influence the dependent variable but are beyond
the researchers' control, and sometimes even their awareness.
✓ They make it difficult to determine the actual impact of the
independent (intentionally manipulated) variable.
✓ If left uncontrolled, extraneous variables can lead to inaccurate
conclusions about the relationship between independent and
dependent variables.
➢ Also note that, as N gets larger, the difference between 𝑅2 and adjusted 𝑅2
gets smaller and smaller.
What is the difference between R square and adjusted R square?
➢ R square and adjusted R square values are used for model validation in case
of linear regression.
✓ R square indicates the variation of all the independent variables on
the dependent variable. i.e. it considers all the independent variable to
explain the variation.
✓ In the case of Adjusted R squared, it considers only significant
variables (p values less than 0.05) to indicate the percentage of
variation in the model.
MULTIPLE REGRESSION ANALYSIS: THE PROBLEM OF INFERENCE
➢ If our sole objective is point estimation of the parameters of the regression
models,
✓ the method of ordinary least squares (OLS), which does not make
any assumption about the probability distribution of the disturbances
𝜀𝑖 , will be sufficient.
➢ The OLS estimators of the regression coefficients are best linear unbiased
estimators (BLUE).
➢ But if our objective is estimation as well as hypothesis testing,
9
✓ then we need to assume that the 𝜺𝒊 follow some probability
distribution.
➢ For reasons we assumed that the 𝜺𝒊 follow the normal distribution with
zero mean and constant variance 𝝈𝟐 .
➢ Moreover, the estimators 𝛽̂0 , 𝛽̂1 , 𝑎𝑛𝑑 𝛽̂2 are themselves normally distributed
with means equal to true 𝛽0 , 𝛽1 , 𝑎𝑛𝑑 𝛽2 and with the variances
1 𝑋̅12 ∑ 𝑥2𝑖
2
+ 𝑋̅22 ∑ 𝑥1𝑖
2
− 2𝑋̅1 𝑋̅2 ∑ 𝑥1𝑖 𝑥2𝑖 2
̂
𝑣𝑎𝑟(𝛽0 ) = [ + ]𝜎
2 ∑ 2
𝑛 ∑ 𝑥1𝑖 𝑥2𝑖 − (∑ 𝑥1𝑖 𝑥2𝑖 )2
𝑠𝑒(𝛽̂0 ) = +√𝑣𝑎𝑟(𝛽̂0 )
2
∑ 𝑥2𝑖
𝑣𝑎𝑟(𝛽̂1 ) = [ 2 2 2
] 𝜎2
(∑ 𝑥1𝑖 )(∑ 𝑥2𝑖 ) − (∑ 𝑥1𝑖 𝑥2𝑖 )
Or, equivalently,
𝑠𝑒(𝛽̂1 ) = +√𝑣𝑎𝑟(𝛽̂1 )
2
∑ 𝑥1𝑖
𝑣𝑎𝑟(𝛽̂2 ) = [ 2 2 ] 𝜎2
(∑ 𝑥1𝑖 )(∑ 𝑥2𝑖 ) − (∑ 𝑥1𝑖 𝑥2𝑖 )2
Or, equivalently,
𝑠𝑒(𝛽̂2 ) = +√𝑣𝑎𝑟(𝛽̂2 )
10
Example
• Husbands’ hours of housework per week (Y)
• Number of children (X1)
• Husbands’ years of education (X2)
Table: 1
2 2
𝑦𝑖 𝑥1𝑖 𝑥2𝑖 𝑥1𝑖 𝑥2𝑖 𝑥1𝑖 𝑥2𝑖
1 1 12 1 144 12
2 1 14 1 196 14
3 1 16 1 256 16
5 1 16 1 256 16
3 2 18 4 324 36
1 2 16 4 256 32
5 3 12 9 144 36
0 3 12 9 144 36
6 4 10 16 100 40
3 4 12 16 144 48
7 5 12 25 144 60
4 5 16 25 256 80
2 2
∑ 𝑥1𝑖 ∑ 𝑥2𝑖 = ∑ 𝑥1𝑖 ∑ 𝑥2𝑖 ∑ 𝑥1𝑖 𝑥2𝑖
∑ 𝑦𝑖 =40 = 32 166 = 112 = 2108 = 426
SPSS Results
Change Statistics
R Adjusted R Std. Error of R Square F
R Square Square the Estimate Change Change df1 df2 Sig. F Change
.499a .249 .082 2.05649 .249 1.490 2 9 .276
ANOVAa
Model Sum of Squares df Mean Square F Sig.
1 Regression 12.604 2 6.302 1.490 .276b
Residual 38.062 9 4.229
Total 50.667 11
a. Dependent Variable: Husbands’ hours of housework per week
11
b. Predictors: (Constant), Husbands’ years of education , Number of children
Coefficientsa
Standardized
Unstandardized Coefficients Coefficients
Model B Std. Error Beta t Sig.
1 (Constant) 1.466 4.385 .334 .746
Number of children .689 .433 .500 1.591 .146
Husbands’ years of
.002 .272 .003 .008 .994
education
a. Dependent Variable: Husbands’ hours of housework per week
32 166
𝑋̅1 = = 2.67, 𝑋̅2 = = 13.83
12 12
From Table 2:
𝜎̂ 2 = 𝑀𝑒𝑎𝑛 𝑆𝑢𝑚 𝑠𝑞𝑢𝑎𝑟𝑒 𝑜𝑓 𝑒𝑟𝑟𝑜𝑟 = 4.29
1 𝑋̅12 ∑ 𝑥2𝑖
2
+ 𝑋̅22 ∑ 𝑥1𝑖
2
− 2𝑋̅1 𝑋̅2 ∑ 𝑥1𝑖 𝑥2𝑖 2
̂
𝑣𝑎𝑟(𝛽0 ) = [ + ]𝜎
2 ∑ 2
𝑛 ∑ 𝑥1𝑖 𝑥2𝑖 − (∑ 𝑥1𝑖 𝑥2𝑖 )2
1 (2.67)2 (2108)+(13.83)2 (112)−2(2.67)(13.83)(426)
=[ + (112)(2108)−(426)2
] 4.229
12
1 7.13(2108)+(191.27)(112)−31461.04
=[ + ] 4.229
12 236096−181476
15030.04+21422.24−31461.04
= [0.083 + ] 4.29
54620
36452.28−31461.04
= [0.083 + ] 4.29
5420
= [0.083 + 0.92]4.29
= 4.30
𝑠𝑒(𝛽̂0 ) = +√𝑣𝑎𝑟(𝛽̂0 ) = +√4.30 = 2.07
2
∑ 𝑥2𝑖
𝑣𝑎𝑟(𝛽̂1 ) = [ 2 2 2
] 𝜎2
(∑ 𝑥1𝑖 )(∑ 𝑥2𝑖 ) − (∑ 𝑥1𝑖 𝑥2𝑖 )
2108
𝑣𝑎𝑟(𝛽̂1 ) = [ ] 4.229
(112)(2108) − (426)2
2108
=[ ] 4.229
236096−181476
12
2108
=[ ] 4.229 = .0385 × 4.229 = 0.163
54620
𝑠𝑒(𝛽̂1 ) = +√𝑣𝑎𝑟(𝛽̂1 ) = +√. 163 = 0.404
2
∑ 𝑥1𝑖
𝑣𝑎𝑟(𝛽̂2 ) = [ 2 2 2
] 𝜎2
(∑ 𝑥1𝑖 )(∑ 𝑥2𝑖 ) − (∑ 𝑥1𝑖 𝑥2𝑖 )
112
𝑣𝑎𝑟(𝛽̂2 ) = [ ] 4.229
(112)(2108) − (426)2
112
=[ ] 4.229
236096−181476
112
=[ ] 4.229 = 0.0087
54620
𝑠𝑒(𝛽̂2 ) = +√𝑣𝑎𝑟(𝛽̂2 ) = +√. 0087 = 0.093
Significance Testing
➢ Significance testing involves testing the significance of the overall
regression equation as well as specific regression coefficients.
➢ The null hypothesis for the overall test is that the coefficient of multiple
2
determination in the population, 𝑅𝑝𝑜𝑝 , is zero.
2
𝐻0 : 𝑅𝑝𝑜𝑝 =0
2
𝐻𝐴 : 𝑅𝑝𝑜𝑝 ≠0
➢ This is equivalent to the following null hypothesis
𝐻0 : 𝛽2 = ⋯ … … … . . = 𝛽𝐾 = 0
𝐻𝐴 : 𝐴𝑡 𝑙𝑒𝑎𝑠𝑡 𝑜𝑛𝑒 𝛽 ≠ 0
➢ One of the formulae for F, which is mainly useful when the original data are
not available is
𝑅2 (𝑁 − 𝐾 − 1)
𝐹=
(1 − 𝑅2 )𝐾
with F (K, N-K-1) distribution.
13
➢ For the data of above example
𝑅2 (𝑁 − 𝐾 − 1) . 249(12 − 2 − 1) 2.241
𝐹= = = = 1.49
(1 − 𝑅2 )𝐾 (1 − .249)2 1.502
➢ The critical value of F with K=2 and N-K-1=12 – 2 -1 = 9 df is 4.26 (using
F distribution Table).
Comment: The critical value of F=4.29 is greater than the calculated value of F
=1.49 at 5% level of significance, providing strong evidence in favor of null
hypothesis.
If the overall null hypothesis is rejected, one or more population regression
coefficients have a value different from 0. To determine which specific coefficients
(𝛽𝑖 ′𝑠) are nonzero, additional tests are necessary. The test statistic for this hypothesis
is
𝛽̂𝑖
𝑡=
𝑠𝑒(𝛽̂𝑖 )
which has t distribution with n – k – 1 degrees of freedom.
Example: Do a t-test to determine whether 𝛽1 is significantly different from 0
Solution:
Step 1: 𝐻0 : 𝛽1 = 0
𝐻1 : 𝛽1 ≠ 0
Step 2: Appropriate test statistic is
̂1
𝛽
𝑡= ̂1 )
𝑠𝑒(𝛽
with n – k – 1 degrees of freedom
➢ In this case n = 12, k = 2, 𝛽̂1 = .636, 𝑠𝑒(𝛽̂1 ) = .404, df =9, 𝛼 = .05, 𝑡𝛼/2 =
2.201
Step 3: For 𝛽̂1 , the computed value of the test statistic is
̂1
𝛽 .636
𝑡= ̂1 ) = = 1.56
𝑠𝑒(𝛽 .404
14
Step 4: Accept 𝐻0 .
Example: Do a t-test to determine whether 𝛽2 is significantly different from 0
Step 1: 𝐻0 : 𝛽2 = 0
𝐻1 : 𝛽2 ≠ 0
Step 2: Appropriate test statistic is
̂2
𝛽
𝑡= ̂2 )
𝑠𝑒(𝛽
with n – k – 1 degrees of freedom
➢ In this case n = 12, k = 2, 𝛽̂2 = −𝟎. 𝟎𝟔𝟓, 𝑠𝑒(𝛽̂2 ) = 0.094, df =9, 𝛼 = .05,
𝑡𝛼/2 = 2.201
Step 3: For 𝛽̂2 , the computed value of the test statistic is
𝛽̂2 −.065
𝑡= = = −0.6915
𝑠𝑒(𝛽̂2 ) . 094
Step 4: Accept 𝐻0 .
15