0% found this document useful (0 votes)
19 views

C1 English

Uploaded by

Vy Lương Mai
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views

C1 English

Uploaded by

Vy Lương Mai
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Chapter 1.

Simple Regression
(Course: Econometrics)

Lê Phương

Faculty of Economic Mathematics


University of Economics and Law
Vietnam National University Ho Chi Minh City
Content

1 Simple Regression Model


Introduction
Ordinary Least Squares (OLS) Method
Coefficient of Determination of the Model

2 Estimation and Hypothesis Testing for Regression Parameters


Assumptions of the OLS Method
Confidence Intervals
Hypothesis Testing

3 Other Issues
Using Simple Linear Regression
Extending the Simple Linear Regression Model
Introduction

Population Regression Function (PRF) for the Simple


Regression Model

yi = β1 + β2 xi + εi
or
E(y|xi ) = β1 + β2 xi ,
where
• y: Dependent variable
• yi : The i-th observed value of the dependent variable
• x: Independent variable
• xi : The i-th observed value of the independent variable
• εi : Random error associated with the i-th observation
Introduction

Interpretation of the parameters β1 and β2 of the model:


• β1 : The intercept of the population regression function, which is
the average value of the dependent variable y when the
independent variable x is zero,
• β2 : The slope of the population regression function, which is the
average change in y when x changes by 1 unit.
Introduction

Sample Regression Function (SRF) for the Simple


Regression Model

yi = β̂1 + β̂2 xi + ei
or
ŷi = β̂1 + β̂2 xi ,
where
• β̂1 : The intercept of the sample regression function, which is the
point estimate of β1 ,
• β̂2 : The slope of the sample regression function, which is the
point estimate of β2 ,
• ei : Random error, which is the point estimate of εi ,
• ŷi : The estimated value of yi .
Ordinary Least Squares (OLS) Method
To find β̂1 and β̂2 such that the sum of squared errors is minimized:
n
X n 
X 2
ei2 = yi − β̂1 − β̂2 xi → min .
i=1 i=1

By solving the extremum problem for a function of two variables, we


obtain Pn
(xi − x)(yi − y)
β̂2 = i=1Pn 2
, β̂1 = y − β̂2 x,
i=1 (xi − x)
Pn Pn
xi yi
where x = i=1
n and y = i=1
n .

Note: We often denote:


• Sxy = ni=1 (xi − x)(yi − y) = ni=1 xi yi − nxy,
P P

• Sxx = ni=1 (xi − x)2 = ni=1 xi2 − nx 2 .


P P
Sxy
Then β̂2 = Sxx .
Ordinary Least Squares (OLS) Method

Questions:
1 Does the sample regression function always pass through the
sample mean point (x, y)? Why?
2 If x increases 10 times and y remains unchanged, how will β̂1
and β̂2 change?
3 If x increases 10 times and y increases 100 times, how will β̂1
and β̂2 change?
Ordinary Least Squares (OLS) Method
Example 1. Observations on TV advertisements (x) and cars sold (y)
for 5 garages in one week give the following data:
x 1 3 2 1 3
y 14 24 18 17 27

Build the sample regression function to predict the number of cars


sold based on the number of advertisements.

Answer: ybi = 10 + 5xi .


Example 2. Observations on income (in million VND/year) and
consumption (in million VND/year) for 10 individuals give the following
data:
consumption 29 42 38 30 29 41 23 36 42 48
income 31 50 47 45 39 50 35 40 45 50

Build the sample regression function.

i = −5.4519 + 0.9549 · incomei .


\
Answer: consumption
Coefficient of Determination of the
Model

Sum of Squares
• Total Sum of Squares (TSS)
X X
TSS = (yi − y)2 = yi2 − ny 2 .

• Regression Sum of Squares (RSS)


X X
RSS = (ŷi − y)2 = β̂22 ( xi2 − nx 2 ).

• Error Sum of Squares (ESS)


X X
ESS = (ŷi − yi )2 = ei2 .

Note: TSS = ESS + RSS.


Coefficient of Determination of the
Model
Coefficient of Determination of the
Model

Coefficient of Determination
RSS ESS
R2 = =1−
TSS TSS
indicates the goodness of fit of the model to the sample data.
• 0 ≤ R 2 ≤ 1,
• R 2 = 1: The model fits the sample data perfectly,
• R 2 = 0: The model does not fit the sample data at all.
Assumptions of the OLS Method
Assumptions of the OLS Method
• Assumption 1: The relationship between y and x is linear. The
values xi are given and not random.
• Assumption 2: The errors εi are random variables with a mean of
0
E(εi |xi ) = 0.
• Assumption 3: The errors εi are random variables with constant
variance
Var (εi |xi ) = σ 2 .
• Assumption 4: No correlation between the errors εi

Cov(εi , εj |xi , xj ) = 0 for i ̸= j.

• Assumption 5: No correlation between εi and xi :

Cov(εi , xi ) = 0.
Assumptions of the OLS Method

Gauss – Markov Theorem


When assumptions 1 – 5 are met, the estimates obtained by the OLS
method are the Best Linear Unbiased Estimators (BLUE) of the
population regression function.
Confidence Intervals

To address questions about confidence intervals and hypothesis


testing for model parameters, we need an additional assumption:
• Assumption 6: The errors εi are normally distributed

εi ∼ N(0, σ 2 ).

Note: σ 2 is known as the population variance and is estimated by the


sample variance P 2
ei ESS
s2 = = .
n−2 n−2
Confidence Intervals
Random Variables β̂1 , β̂2

β̂1 ∼ N(β1 , σβ̂2 ), β̂2 ∼ N(β2 , σβ̂2 ),


1 2

with variances
xi2 2
P 2
σ2 s2
P
xi 2
σβ̂2 = σ ≈ s , σβ̂2 = ≈ .
1 nSxx nSxx 2 Sxx Sxx
Pn Pn
(Recall: Sxx = i=1 (xi − x)2 = i=1 xi2 − nx 2 .)
Define qP
xi2 2
• se(β̂1 ) = nSxx s : standard error of β̂1 ,
q
• se(β̂2 ) = s2
Sxx : standard error of β̂2 ,
then

β̂1 − β1 β̂2 − β2 (n − 2)s2


≈ t(n − 2), ≈ t(n − 2), ≈ χ2 (n − 2).
se(β̂1 ) se(β̂2 ) σ2
Confidence Intervals

Thus,
• The 1 − α confidence interval for β1 is
 
β̂1 − t α2 (n − 2)se(β̂1 ), β̂1 + t α2 (n − 2)se(β̂1 )

• The 1 − α confidence interval for β2 is


 
β̂2 − t α2 (n − 2)se(β̂2 ), β̂2 + t α2 (n − 2)se(β̂2 ) .

• The 1 − α confidence interval for σ 2 is


!
(n − 2)s2 (n − 2)s2
, .
χ2α (n − 2) χ21− α (n − 2)
2 2
Hypothesis Testing

Hypotheses to test include:


• Hypotheses about the regression coefficients
• Hypotheses about the variance of εi
• Hypotheses about the goodness of fit of the model.
Types of hypotheses:
• Two-tailed hypotheses, left-tailed hypotheses, and right-tailed
hypotheses.
Basic testing methods:
• Confidence interval method
• Critical value method
• p-value method.
Hypothesis Testing
Testing Hypothesis about β2
Hypotheses: H0 : β2 = b, H1 : β2 ̸= b, where b is a given real number.
• Confidence interval method:
1 Construct the 1 − α confidence interval for β2 .
2 If b falls within the confidence interval, accept H0 . If b does not fall
within the confidence interval, reject H0 .
• Critical value method (t-test):
β̂2 −b
1 Calculate the test statistic t = se(β̂2 )
.
2 If |t| > t α2 (n − 2), reject H0 ; otherwise, accept H0 .
• p-value method:
1 Calculate the p-value = P(|T | > t), where T ∼ t(n − 2).
2 If p-value < α, reject H0 ; otherwise, accept H0 .

Testing Hypothesis about β1


β̂1 −b
Similar to β2 , but the test statistic is t = se(β̂1 )
.
Hypothesis Testing

Testing Hypothesis about σ 2


Hypotheses: H0 : σ 2 = σ02 , H1 : σ 2 ̸= σ02 , where σ0 > 0 is given.
• Confidence interval method:
1 Construct the confidence interval for σ 2 .
2 If σ02 falls within the confidence interval, accept H0 . If σ02 does not
fall within the confidence interval, reject H0 .
• Critical value method (chi-squared test):
(n−2)s2
1 Calculate the test statistic χ2 = σ02
.
2 If χ2 < χ21− α (n − 2) or χ2 > χ2α (n − 2), reject H0 ; otherwise,
2 2
accept H0 .
• p-value method:
1 Calculate the p-value.
2 If p-value < α, reject H0 ; otherwise, accept H0 .
Hypothesis Testing
Testing the Goodness of Fit of the Model
Hypotheses: H0 : R 2 = 0, H1 : R 2 ̸= 0.
Critical value method (F-test):
(n−2)R 2
1 Calculate the test statistic F = 1−R 2
.
2 If F > Fα (1, n − 2), reject H0 ; otherwise, accept H0 .
RSS
Note: F = ESS/(n−2) .

Evaluating Regression Results


• Are the signs of the estimated regression coefficients consistent
with theory or prior expectations?
• Are the estimated regression coefficients statistically significant?
• How well does the model fit (R 2 ) and is the model actually fitting
well?
• Check if the model meets the assumptions of the classical linear
regression model.
Using Simple Linear Regression
Presenting Regression Results

ŷ = β̂1 + β̂2 x R2
se se(β̂1 ) se(β̂2 ) df
t t(β̂1 ) t(β̂2 ) F
p − value p(β̂1 ) p(β̂2 ) p(F )

Changing Units in the Regression Model


If the units of X and Y change, there is no need to re-run the
regression; instead, apply the unit conversion formula.
• Original regression model: ŷi = β̂1 + β̂2 xi ,
• Regression model with new units: ŷi∗ = β̂1∗ + β̂2∗ xi∗ ,
where ŷi∗ = k1 ŷi , xi∗ = k2 xi . Then

k1
β̂1∗ = k1 β̂1 , β̂2∗ = β̂2 .
k2
Using Simple Regression Model

Prediction Problem
Suppose we have the sample regression function: ŷi = β̂1 + β̂2 xi .
• Point estimate of y at x = x0 is

ŷ0 = β̂1 + β̂2 x0 .

We have: ŷ0 ∼ N(β1 + β2 x0 , σŷ20 ), where

1 (x0 − x)2 1 (x0 − x)2


   
σŷ20 =σ 2
+ ≈s 2
+ .
n Sxx n Sxx
Using Simple Regression Model
Prediction Problem
• 1 − α Confidence interval for the mean of y when x = x0 is

ŷ0 − t α2 (n − 2)se(ŷ0 ), ŷ0 + t α2 (n − 2)se(ŷ0 ) ,

where s
1 (x0 − x)2
se(ŷ0 ) = s + .
n Sxx
• 1 − α Confidence interval for the individual value of y when
x = x0 is

ŷ0 − t α2 (n − 2)se(y0 − ŷ0 ), ŷ0 + t α2 (n − 2)se(y0 − ŷ0 ) ,

where s
1 (x0 − x)2
se(y0 − ŷ0 ) = s 1+ + .
n Sxx
Extending the Simple Linear
Regression Model
Regression through the Origin
When the intercept is zero, the model becomes a regression through
the origin, where the regression function is
• PRF: yi = β2 xi + εi ,
• SRF: yi = β̂2 xi + ei ,
with
σ2
P
xi yi
β̂2 = P 2 and σβ̂2 = P 2 ,
xi 2 xi
ESS
where σ 2 is estimated by s2 = n−1 .
Note:
( xi yi ) 2
P
• R 2 can be negative for this model, so use R02 = P 2P instead,
xi yi2
• R 2 cannot be compared with R02 .
The regression through the origin model is rarely used in practice.
Extending the Simple Linear
Regression Model
• Log-log or double log model:
ln yi = β1 + β2 ln xi + εi .
Interpretation of β2 : When x changes by 1%, y changes by β2 %.
• Log-linear model:
ln yi = β1 + β2 xi + εi .
Interpretation of β2 : When x changes by 1 unit, y changes by
100 · β2 %.
• Linear-log model:
yi = β1 + β2 ln xi + εi .
Interpretation of β2 : When x changes by 1%, y changes by
β2 /100 units.
• Inverse model:
1
yi = β1 + β2 + εi .
xi
These models are nonlinear in variables but can be transformed into
linear forms by changing variables.
Extending the Simple Linear
Regression Model
Exercise: Given the regression results between Y - sales (million
VND/ton) and X - price (thousand VND/kg) as follows:

ŷ = 18.8503 − 1.0958x R 2 = 0.8681


se 1.5729 0.1743 df = 6
t 11.9837 −6.2842 F = 39.49

1 State the economic interpretation of the regression coefficients.


2 Test whether the price affects sales at 1% significance level.
3 If the price is 8.5 thousand VND/kg, what is the average sales?
4 Rewrite the SRF if the unit of y is million VND/ton.
5 Test the hypothesis H0 : β2 = −1, H1 : β2 ̸= −1 at the 1%
significance level.
6 Calculate the elasticity of y with respect to x at the point (x, y),
provided that x = 8 thousand VND/kg.

You might also like