0% found this document useful (0 votes)
731 views29 pages

ECON7310: Elements of Econometrics: Research Project 2

The document is a research project that analyzes panel data and a binary response model to examine factors influencing employment status. For the panel data, the author estimates models using OLS, TSLS, and fixed effects to account for endogeneity. Tests indicate one regressor is endogenous. For the binary response model, the author uses LPM, probit and logit regressions to examine how age nonlinearly impacts employment probabilities. Tests show age is a statistically significant determinant of employment status.

Uploaded by

Saad Masood
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
731 views29 pages

ECON7310: Elements of Econometrics: Research Project 2

The document is a research project that analyzes panel data and a binary response model to examine factors influencing employment status. For the panel data, the author estimates models using OLS, TSLS, and fixed effects to account for endogeneity. Tests indicate one regressor is endogenous. For the binary response model, the author uses LPM, probit and logit regressions to examine how age nonlinearly impacts employment probabilities. Tests show age is a statistically significant determinant of employment status.

Uploaded by

Saad Masood
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 29

ECON7310: Elements of Econometrics

Research Project 2

Name: Lei LEI

Student number: 44605696

Question 1: OLS, TSLS, and Panel Data Regression

Consider the following linear panel data modelyit = β0 + β1xit,1 + β2xit,2


+ β3xit,3 + uit (1)

where (xit,1, xit,2, xit,3) are explanatory variables, uit is unobservable

error, and (β0, β1, β2, β3) are unknown parameters of interest. As usual,

i = 1, ..., N refers to individuals (id, cross- sectional units) and t = 1, ...,

T refers to time periods. Use the data file Q1data.dta to answer the

following questions. Unless otherwise specified, use 5% as the

significance level for all the tests below.

(a)

(5 points) Declare the data to be a panel via specifying the individual

identifier (id) and time identifier (t). Which regressor(s) are not time-

varying? What are N and T ? Do you have a balanced panel? [Hint:

You can use the egen command along with the by option to compute
the standard deviation of each regressor for each i. Which regressor(s)

have zero variation over time?]

Answer:

X3 is not time-varying. It is only time-invariant regressor.

N=150. Ti=T=5, so we have a balanced panel.


(b)

(5 points) Use OLS to estimate (1) and report estimation results.

Answer:

The estimated model is:


^y =2.398 x it , 1−2.052 x it , 2+0.327 x it , 3+1.886

(0.029) (0.073) (0.183) (0.078)


=0.914
R2
(c)

(10 points) It is well known that the standard errors(SE)of panel data

estimation need to be adjusted to control for likely correlation of the

error uit over time for given i (clustering on i), i.e., C(uit,uis|xi) ̸= 0 for t

̸= s. Re-estimate (1) using OLS and calculate cluster- robust SE.

Compare the estimation results with those obtained in (b). Comment on

your findings. If C (uit, uis|xi) ̸= 0 is true, do you think the OLS

estimator is BLUE?

Answer:
^y =2.398 x it , 1−2.052 x it , 2+0.327 x it , 3+1.886

(0.029) (0.073) (0.183) (0.078)


=0.914
R2

The coefficient estimates and R2 are exactly same with those in Part b.

However, the SE are quite different (larger), which provides evidence


for non-zero correlation of the error over time, i.e. C(uit,uis|xi) ̸= 0. If

this is true, the Gauss-Markov theorem does not hold. In this case, the

pooled OLS estimator is not BLUE.

(d)

(10 points) One of your friends argues that the OLS estimator may be

problematic as xit,1 is probably endogenous. If this were true, which

assumption of linear regression would not be valid, and what could be

wrong with using OLS? Your friend suggests that you should use

TSLS rather than OLS. In particular, he proposes two instrumental

variables (IV), zit,1 and zit,2, for xit,1. What conditions must hold for zit,1

and zit,2 to be valid IV?

Answer:

. The exogeneity assumption will be violated. When E[uit |xit,1 , xit,2,


xit,3] ̸= 0, the OLS estimator will not be consistent or unbiased.

For zit,1 and zit,2 to be valid IV, they should satisfy the following

three conditions: 1. They are uncorrelated with uit ; 2. They are

correlated with xit,1. 3. They do not have direct effects on yit.

(e)

(15 points) Estimate (1) using TSLS with zit,1 and zit,2 as IV. As in (c),

you should compute and report cluster-robust SE. Compare the TSLS

estimates with the OLS estimates obtained in (c), and comment on

your findings. Assuming both zit,1 and zit,2 are valid IV, do you think

xit,1 is an endogenous regressor? Explain your answer. Suppose you are

pretty sure that zit,1 is exogenous. Name a test that can be used to check

if zit,2 also satisfies the exogeneity condition. Assess the strength of

(zit,1,zit,2) as IV. What is the first-stage regression of the TSLS?

Answer:
^y =1.724 x it ,1−2.113 x it , 2+1.191 x it , 3+1.931

(0.320) (0.184) (0.659) (0.222)


=0.8521
R2

Comparing with estimated model from Part (c), it is obviously that the

TSLS estimated coefficients are different with the OLS estimates

obtained from Part (c), which means that there is at least one

endogenous regressors.
This intuition is verified by the (robust) Hausman test. It’s p-value is

0.0019, which is less than 0.05. Therefore, we reject the null

hypothesis that xit,1.

 The F-test can be used to evaluate the IV strength. The F-statistic

here is 6.59331<10. Therefore, we cannot reject the null hypothesis

that none of zit,1 and zit,2 is strong IV.

We conduct the overidentifying test to test the exogeneity of zit,2.


The first-stage regression of the TSLS should be like the estimates

below:
^
X 1=−0.108 x it ,2+1.309 x it , 3+0.305 z it , 1+ 0.209 z it ,2−0.046
(0.091) (0.224) (0.144) (0.071) (0.120)
=0.1034
R2
(f)

(10 points) To capture potential time effects, consider the following

model Tyit = β0 + β1xit,1 + β2xit,2 + β3xit,3 + 􏰀 γtds,t + vit (2) s=2 where ds,t

are time dummies (ds,t = 1 if s = t, and 0 otherwise). Note that the

sample includes data from t = 1tot = T, but(2)includes only dummies

for t = 2tot = T. Why? Estimate (2) using TSLS with zit,1 and zit,2 as IV

and test if time effects are significant, i.e., at least one γt are not zero.

With time effects controlled, do you think xit,1 is still an endogenous

regressor? [Hint: Use OLS and TSLS to estimate (2) and compare their

estimates.]

Answer:

The estimated model is:


^
y it=1.680 x it , 1−2.103 x it , 2+ 1.218 x it , 3+0.009 d 2−0.001 d 3−1.024 d 4+0.920 d 5+1.953
(0.282) (0.097) (0.429) (0.233) (0.233) (0.233) (0.233) (0.180)
=0.8570
R2

We did not include d1 in the regression because we want to avoid

perfect multicollinearity. The P-value for test H0: y2=y3=y4=y5 is

essentially 0. Therefore, we reject the null hypothesis that there exist

no time effects. It turns out that controlling time effects does not help

eliminate the endogeneity problem as the p-value for the Hausman test

is still very small (0.0003<0.05).


(g)

(10 points) Suppose that vit = αi + eit with eit ∼ i.i.d.(0, σe2). Re-write

(2) as T yit =β0 +β1xit,1 +β2xit,2 +β3xit,3 +􏰀γtds,t +αi +eit (3) s=2 Treat αi as

fixed effects (FE). Use an FE estimator to estimate (3)1. Justify the fact

that the FE estimator cannot estimate all slope coefficients. Compare

the FE estimates with the TSLS estimates obtained in (f). Comment on

your findings.

Answer:

Since xit,3 is time-invariant, the FE estimator cannot estimate its

coefficient β3. FE and 2SLS estimates look similar. If included

regressors are correlated with αi but uncorrelated with eit, the FE


estimator is consistent. While 2SLS is consistent regardless of whether

included regressors are correlated with αi or eit or both. The similarity

between the FE and 2SLS estimates implies that the FE model is

capable of handing the endogeneity problem in this question, i.e., it is

appropriate to assume E[eit| vit]=0 but allow E[αi| vit]=/0. Note that the

FE estimator has much smaller SE for the coefficients on ( xit,1, xit,2).

This is not surprising as we know from Part(e) that zit1 and zit2 are

weak IV. Using weak IV often leads to imprecise 2SLS estimates.


Question 2: Binary Response Model

In April 2008, the unemployment rate in the United States stood at 5%.

By April 2009, it had increased to 9%, and it had increased further, to

10%, by October 2009. Were some groups of workers more likely to

lose their jobs than others during the Great Recession? For example,

were young workers more likely to lose their jobs than middle-aged

workers? What about workers with a college degree versus those

without a degree or women versus men? The data file employment 08-

09.dta contains a random sample of 5440 workers who were surveyed


in April 2008 and reported that hey were employed full-time. A

detailed description is given in employment 08 09 description pdf.

These workers were surveyed one year later, in April 2009, and asked

about their employment status (employed, unemployed, or out of the

labor force). The data set also includes various demographic measures

for each individual. Use these data to answer the following questions.

(a)

(5 points) Regress employed on age and age 2, using a linear probability

model (LPM). Report regression results. Was age a statistically

significant determinant of employment? Is there evidence of a

nonlinear effect of age on the probability of being employed?

Answer:

The estimated model is:


^
y employed=0.0283 x age−0.0003 x age 2+ 0.3075
(0.0033) (0.0000) (0.0669)
2 =0.0197
R

Because the p-value of age and age^2 are both almost 0 and less than

0.05. Therefore, age is a statistically significant determinant

employment.

From the p-value, we know that age and age^2 both are statistically
significant in this linear probability model. This can proves that there is

a non-linear effect of age.

(b)

(5 points) Repeat (a) using a probit regression.

Answer:

The estimated model is:


^
y employed=0.1217 x age−0.0014 x age 2−1.2579
(0.0129) (0.0002) (0.2499)

Because the p-value of age and age^2 are both almost 0 and less than

0.05. Therefore, age is a statistically significant determinant

employment.

From the p-value, we know that age and age^2 both are statistically
significant in this linear probability model. This can prove that there is

a non-linear effect of age.

(c)

. (5 points) Repeat (a) using a logit regression.

Answer:

The estimated model is:


^
y employed=0.2255 x age−0.0026 x age 2−2.4898
(0.0234) (0.0003) (0.4462)

Because the p-value of age and age^2 are both almost 0 and less than

0.05. Therefore, age is a statistically significant determinant

employment.

From the p-value, we know that age and age^2 both are statistically
significant in this linear probability model. This can prove that there is

a non-linear effect of age.

(d)

(6 points) Compute the predicted probability of employment for a 20-

year-old worker, a 40-year-old worker, and a 60-year-old worker.

Answer:

linear probit logit


20 0.742 0.730 0.725
40 0.916 0.912 0.911
60 0.828 0.832 0.831

For logit:
For Linear:
For Probit:
(e)

(4 points) Are there important differences in your answers to (d)?

Explain.

Answer:

The results of Part (d) shows that there are significant differences in

the predictions for different age groups.

Additionally, the probability goes up first and then decrease when age

increases from 20 years old to 60 years old, which also proves that

there is non-linear effect exists.


There is slight difference between different models but the trending in

all of these models are pretty similar.

(f)

(10 points) The data set includes variables measuring the workers’

educational attain- ment, sex, race, marital status, region of the

country, and weekly earnings in April 2008. Repeat (a)-(c) using these

factors as additional regressors and construct a table like Ta- ble 11.2

in SW (pp. 410-411) to investigate whether the conclusions on the

effect of age on employment from (a)-(c) are affected by omitted

variable bias. Use the regressions in your table to discuss the

characteristics of workers who were hurt most by the Great Recession.

[Hint: You will need to generate dummies for race groups and use

logarithm of weekly earnings.]

Answer:

The coefficients of age and age^2 are quite different from the

estimation of part (a) to part (c), which obviously indicates that there

are omitted variable bias in the models from part (a) to part (c).

In conclusion, from the second table below, we can know that a young

married man, who only has a high school degree and below, and has

lower weekly earning will suffer the most in the Great Recession.

You might also like