0% found this document useful (0 votes)
21 views20 pages

11 - Econometrics - Linear Regression

The document outlines a course on Econometrics at the University of Milan-Bicocca, focusing on interpreting and comparing regression models. It includes regression model specifications, estimation using Stata, and interpretations of coefficients, as well as discussions on the implications of including or omitting variables in regression analysis. Additionally, it covers nonlinearities in models and provides examples related to wage and house price functions.

Uploaded by

Lorenzo Lucchesi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views20 pages

11 - Econometrics - Linear Regression

The document outlines a course on Econometrics at the University of Milan-Bicocca, focusing on interpreting and comparing regression models. It includes regression model specifications, estimation using Stata, and interpretations of coefficients, as well as discussions on the implications of including or omitting variables in regression analysis. Additionally, it covers nonlinearities in models and provides examples related to wage and house price functions.

Uploaded by

Lorenzo Lucchesi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

Econometrics

University of Milan-Bicocca

Course lecturer:
Maryam Ahmadi
[email protected]

1
Interpreting and
Comparing Regression
Models

2
Problem & Answer. 10.
1- Write the regression model for each specification in the table 2.6.

A. Wage = 𝛽0 + 𝛽1 male + u
B. Wage = 𝛽0 + 𝛽1 female + u
C. Wage = 𝛽1 male + 𝛽2 female + u

2- use Stata and data wages1 to estimate each model and report the output table.

A. 𝑤𝑎𝑔𝑒
ෟ = 5.147 + 1.166 male
B. 𝑤𝑎𝑔𝑒
ෟ = 6.313 – 1.166 female
C. 𝑤𝑎𝑔𝑒
ෟ = 6.313male + 5.147 female

3- Interpret the coefficients in each output table, and compare them.

3
4
Interpreting the Linear Model
The linear model: y = 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + 𝛽3 𝑥3 + 𝑢
The model for the individual number i: 𝑦i = 𝛽i0 + 𝛽1 𝑥i1 + 𝛽2 𝑥i2 + 𝛽3 𝑥i3 + 𝑢i
The model for the individual number i in matrix form: yi = xi‘β + ui
The model actually has no meaning unless we make some assumptions about ui.
A1&A2: E{ui | xi} = 0.
Under this assumption, the regression model describes the expected value of y given x:
E{yi | xi} = xi‘β.
Therefore, coefficient βj measures the expected change in yi if xij changes by one unit, but all other
variables in xi do not change. That is,
𝜕𝐸 𝑦𝑖 | 𝑥𝑖
= β𝑗
𝜕𝑥𝑖𝑗
The statement that the other variables in xi do not change is a ceteris paribus condition. 5
In a multiple regression model, single coefficients can only be interpreted under a
ceteris paribus condition. Thus, (strictly speaking) can only be interpreted if we know
which other variables are included.

• If we are interested in the relationship between yi and xij (y and xj for each individual)
the other regressors act as control variables.
For example, what is the impact of an earnings announcement upon a firm’s stock price
controlling for overall market movements?

• Sometimes, ceteris paribus is hard to maintain.


For example, what is the impact of age upon a person’s wage, keeping years of
experience fixed?

6
• Sometimes, ceteris paribus is impossible, for example if the model includes
both age and age-squared.
• Example: model includes

then the marginal effect of a changing age (ceteris paribus) is

Consequently, the marginal effect of age on y depends upon age.

7
• More generally, the effect of one explanatory variable could depend upon
another.
• Example: model includes
𝛽2 𝑎𝑔𝑒𝑖 + 𝛽3 𝑎𝑔𝑒𝑖 𝑚𝑎𝑙𝑒𝑖

then the marginal effect of a changing age (ceteris paribus) is


𝜕𝐸𝑙𝑦𝑖 |𝑥𝑖 )
= 𝛽2 + 𝛽3 𝑚𝑎𝑙𝑒𝑖
𝜕𝑎𝑔𝑒𝑖

Consequently, the marginal effect of age on y depends upon gender (and the
effect of gender on y depends upon age).

8
Incorporating nonlinearities: Semi-logarithmic form (log-level form)

Regression of log wages on years of education

• This changes the interpretation of the regression coefficient. Each additional year of education increases wage
by a constant percentage.
%∆𝑤𝑎𝑔𝑒 ≈ (100 ∙ 𝛽1 )∆𝑒𝑑𝑢𝑐

log⁡
(𝑤𝑎𝑔𝑒) = 0.584 + 0.083𝑒𝑑𝑢𝑐
𝑛 = 526, 𝑅2 = 0.186

• This means that if years of education are increased by one year, ceteris paribus, wage increases by 8.3%.
• R squared shows that education explains 18.6% of variations in log(wage).
• The percentage change in wage is the same for each aditional year of education, so the change in wage for an
extra year of education increases as education increases, this implies increasing return to education.
9
. Incorporating nonlinearities: Semi-logarithmic form (level-log form)

Regression of annual salary on log of firm sales.


𝑠𝑎𝑙𝑎𝑟𝑦 = 𝛽0 + 𝛽1 𝑙𝑜𝑔 𝑠𝑎𝑙𝑒𝑠 + 𝑢

• This changes the interpretation of the regression coefficient. Each percentage change in firm sales
increases annual sallary by a constant value.

𝛽1
∆𝑠𝑎𝑙𝑎𝑟𝑦 = %∆𝑠𝑎𝑙𝑒
100
If 𝛽መ1 =5.33 5.33/100=0.0533 %∆𝑠𝑎𝑙𝑒

• This means that if sale increases by 1%, salary increases by 0.053


10
. Incorporating nonlinearities: logarithmic form (log-log form)

• Often, economists are interested in elasticities. An elasticity measures the percantage change in the
dependent variable yi due to a percentage change in xik.

• Elasticities can be estimated directly from a linear model, formulated in logs (excluding dummy
variables), that is

where log xi is shorthand for a vector with elements (1, log xi2,…, log xik)’. This is called a loglinear
model.

If a dummy is included in the loglinear model, its coefficient measures the expected relative change in yi
due to an absolute change in xik.
11
This changes the interpretation of the regression coefficient
%∆𝑠𝑎𝑙𝑎𝑟𝑦 = 𝛽1 %∆𝑠𝑎𝑙𝑒𝑠
• This means that if sales increase by 1% salary increases by 𝛽1 %

The salary increases by 0.257% for each 1% change in sales

The log-log form postulates a constant elasticity model, whereas the semi-log
form assumes a semi-elasticity model.
12
Important table

Logarithmic changes are always percentage changes

13
Misspecifying the Set of Regressors
1- Including irrelevant variables in a regression model

• Suppose that the true model is

𝑦 = 𝛽0 + 𝛽1 𝑥1 +𝛽2 𝑥2 + 𝑢
• But we estimated
𝑦ො = 𝛽መ0 + 𝛽መ1 𝑥1 +𝛽መ2 𝑥2 +𝛽መ3 𝑥3

• Including irrelevant variable DOES NOT EFFECT ON UNBIASEDNESS OF estimators.


However it may increase the variance of the estimators of the model parameters, i.e.
the estimation of the model parameters will be less reliable.
• Therefore, Including as many variables as possible in a model is thus not a good strategy

14
Misspecifying the Set of Regressors
2- Omitting relevant variables from a regression model
Suppose the true model is
𝑦 = 𝛽0 + 𝛽1 𝑥1 + 𝛾𝑥2 + 𝑢

But we estimate a model without x2 due to ignorance

𝑦 = 𝛽0 + 𝛽1 𝑥1 + 𝑢
Therefore,
𝛽መ = σ𝑛𝑖=1 𝑥1 ′ 𝑥1 −1 σ𝑛 𝑥 ′ 𝑦
𝑖=1 1 = 𝛽 + σ𝑛𝑖=1 𝑥1 ′ 𝑥1 −1 σ𝑛 𝑥 ′ 𝑥 𝛾
𝑖=1 1 2 + σ𝑛𝑖=1 𝑥1 ′ 𝑥1 −1 σ𝑛 𝑥 ′ 𝑢
𝑖=1 1

we know that the true specification is the first model, by replacing y from the true model we will have...

All estimated coefficients will be biased:


When is there no omitted variable bias?
If the omitted variable is irrelevant or uncorrelated.
15
Example: Omitting ability in a wage equation

The return to education will be overestimated because both 𝛽2 is


positive. It will look as if people with many years of education earn
very high wages, but this is partly due to the fact that people with
more education are also more able on average.

16
Illustration: explaining house prices
An example of estimating house price function by the combination of its
characteristics.

Data: housing.dta
prices of 546 houses sold in Canada (1987). We observe characteristics like:
lot size, number of bedrooms, bathrooms, garage places, stories, and
dummy variables for presence of driveway, recreational room, full basement,
airco, gas hot water heating, and being located in a preferred area.

17
𝑙𝑜𝑔(price) =
𝛽0 + 𝛽1 𝑙𝑜𝑔(𝑙𝑜𝑡𝑠𝑖𝑧𝑒)
+ 𝛽2 𝑏𝑒𝑑𝑟𝑜𝑜𝑚𝑠
+ 𝛽3 𝑎𝑖𝑟𝑐𝑜 + 𝑢

Corr(b𝑒𝑑𝑟𝑜𝑜𝑚𝑠, 𝑏𝑎𝑡ℎ𝑟𝑚) = 0.37


Corr(𝑙𝑜𝑡𝑠𝑖𝑧𝑒, bathrm) = 0.19
Corr(𝑎𝑖𝑟𝑐𝑜𝑛, bathrm) = 0.18

𝑙𝑜𝑔(price) =
𝛽0 + 𝛽1 𝑙𝑜𝑔(𝑙𝑜𝑡𝑠𝑖𝑧𝑒)
+ 𝑏𝑒𝑑𝑟𝑜𝑜𝑚𝑠
+ 𝛽3 𝒃𝒂𝒕𝒉𝒓𝒐𝒐𝒎𝒔
+ 𝛽4 𝑎𝑖𝑟𝑐𝑜 + 𝑢

18
𝑙𝑜𝑔(price) =
𝛽0 + 𝛽1 𝑙𝑜𝑔(𝑙𝑜𝑡𝑠𝑖𝑧𝑒)
+ 𝛽2 𝑏𝑒𝑑𝑟𝑜𝑜𝑚𝑠
+ 𝛽3 𝑎𝑖𝑟𝑐𝑜 + 𝑢

Corr(b𝑒𝑑𝑟𝑜𝑜𝑚𝑠, 𝑑𝑟𝑖𝑣𝑒𝑤𝑎𝑦) = -0.012


Corr(𝑙𝑜𝑡𝑠𝑖𝑧𝑒, 𝑑𝑟𝑖𝑣𝑒𝑤𝑎𝑦) = 0.29

𝑙𝑜𝑔(price) =
𝛽0 + 𝛽1 𝑙𝑜𝑔(𝑙𝑜𝑡𝑠𝑖𝑧𝑒)
+ 𝑏𝑒𝑑𝑟𝑜𝑜𝑚𝑠
+ 𝛽3 𝒅𝒓𝒊𝒗𝒆𝒘𝒂𝒚
+ 𝛽4 𝑎𝑖𝑟𝑐𝑜 + 𝑢

19
Problem 11.

20

You might also like