0% found this document useful (0 votes)
96 views9 pages

Problem Set 2 SOLUTIONS

This document contains solutions to problems involving multiple linear regression models. It analyzes regressions using data on workers' earnings and home sale prices. For earnings, it finds that college education and age positively impact earnings while being female lowers earnings. For home prices, it finds that additional bedrooms, bathrooms, house size increase price while poor condition lowers price. It uses the regressions to predict changes in earnings or home value from changes in regressors like education, age, bathrooms or house condition. The R2 values show the regressions explain 17.6-19.4% of earnings variation and 72% of home price variation.

Uploaded by

Luca Vanz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
96 views9 pages

Problem Set 2 SOLUTIONS

This document contains solutions to problems involving multiple linear regression models. It analyzes regressions using data on workers' earnings and home sale prices. For earnings, it finds that college education and age positively impact earnings while being female lowers earnings. For home prices, it finds that additional bedrooms, bathrooms, house size increase price while poor condition lowers price. It uses the regressions to predict changes in earnings or home value from changes in regressors like education, age, bathrooms or house condition. The R2 values show the regressions explain 17.6-19.4% of earnings variation and 72% of home price variation.

Uploaded by

Luca Vanz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Universidad Carlos III de Madrid

ME & MIEM
Econometrics
Multiple Linear Regression. Estimation I
Problem Set 2_SOLUTIONS

1. A multiple regression includes two regressors

Y = 0 + 1 X1 + 2 X2 + U:

(a) What is the expected variation in Y if X1 increases in 3 units and X2 does not change?
1 X1 = 3 1 :
(b) What is the expected variation in Y if X2 increases in 5 units and X1 does not change?
2 X2 = 5 2 :
(c) What is the expected variation in Y if X1 increases in 3 units and X2 decreases in 5 units?
1 X1 + 2 X2 = 3 1 5 2:
(d) Explain why it is di¢ cult to accurately estimate the partial e¤ect of X1 , keeping X2 constant, if X1 and
X2 are highly correlated.
In the case where both regressors are highly correlated there is not su¢ cient variation in the sample of
one regressor while keeping constant the other one, so the partial e¤ects are estimated using only limited
information, and therefore, with reduced accuracy.

1
2. The following results were obtained using data from the 1998 Current Population Survey (CPS). The data
set consists of information on 4000 full-time full-year workers. The highest educational achievement for each
worker was either a high school diploma or a bachelor’s degree. The worker’s ages ranged from 25 to 34 years.
The data set also contained information on the region of the country where the person lived, marital status
and number of children:

AHE = average hourly earnings (in 1998 dollars)


College = binary variable (1 if college, 0 if high school)
Female = binary variable (1 if female, 0 if male)
Age = age (in years)
Ntheast= binary variable (1 if Region = Northeast. 0 otherwise)
Midwest = binary variable (1 if Region = Midwest, 0 otherwise)
South = binary variable (1 if Region = South, 0 otherwise)
West= binary variable (1 if Region = West, 0 otherwise)

Dependent Variable: Average Hourly Earnings (AHE)


Regressor (1) (2) (3)
College (X1 ) 5.46 5.48 5.44
Female (X2 ) -2.64 -2.62 -2.62
Age (X3 ) 0.29 0.29
Northeast (X4 ) 0.69
Midwest (X5 ) 0.60
South (X6 ) -0.27
Intercept 12.69 4.40 3.75
Summary Statistics
SSR 6.27 6.22 6.21
R2 0.176 0.190 0.194
n 4,000 4,000 4,000

(a) Compute R2 for each of the regressions.


With the formula relating adjusted R2 and non-adjusted R2
n 1
R2 = 1 1 R2 ;
n k 1
we …nd

2 n 1 2 4000 1
R(1) = 1 1 R(1) =1 (1 0:176) = 0:176
n k(1) 1 4000 2 1
2 n 1 2 4000 1
R(2) = 1 1 R(2) =1 (1 0:190) = 0:189
n k(2) 1 4000 3 1
2 n 1 2 4000 1
R(3) = 1 1 R(3) =1 (1 0:194) = 0:193
n k(3) 1 4000 6 1

Using the regression results in column (1):


(b) Do workers with college degrees earn more on average than workers with only high school degrees? How
much more?
Yes, because
\ = AHE
AHE \ (College = 1) AHE \ (College = 0) = ^ 1 = 5:46 > 0;

implying that workers with college degrees earn 5.46$/hour more on average.

2
(c) Do men earn more than women on average? How much more?
Yes, because

\ = AHE
AHE \ (F emale = 0) \ (F emale = 1) =
AHE ^ = 2:64 > 0;
2

so male workers earn 2.64$/hour more on average.


Using the regression results in column (2):
(d) Is age an important determinant of earnings? Explain.
The coe¢ cient of Age; ^ 3 = 0:29; means that earnings increase, on average, by 0.29 dollars per hour when
workers age by 1 year, so that after 10 years the e¤ect is similar to that of being male and half of that of
having a college degree, so the e¤ect is important.
(e) Sally is 29 -year-old female college graduate. Betsy is a 34-year-old female college graduate. Predict Sally’s
and Betsy’s earnings.

\ (Sally)
AHE \ (College = 1; F emale = 1; Age = 29)
= AHE
= ^ +^0 1+^1 1+^ 2 29 3
= 4:40 + 5:48 1 2:62 1 + 0:29 29
= 15:67$=hour

\ (Betsy)
AHE \ (College = 1; F emale = 1; Age = 34)
= AHE
= ^ +^0 1+^1 1+^ 2 34 3
= 4:40 + 5:48 1 2:62 1 + 0:29 34
= 17:12$=hour

The di¤erence is 1:45:


Using the regression results in column (3):
(f) Do there appear to be important regional di¤erences?
Workers in the Northeast earn $0.69 more per hour than workers in the West, on average, controlling for
other variables in the regression. Workers in the Northeast earn $0.60 more per hour than workers in the
West, on average, controlling for other variables in the regression. Workers in the South earn $0.27 less
than workers in the West.
(g) Why is the regressor West omitted from the regression? What would happen if it was included?
The regressor W est is omitted to avoid perfect multicollinearity. If W est is included, then the intercept
can be written as a perfect linear function of the four regional regressors.
(h) Juanita is a 28-year-old female college graduate from the South. Jennifer is a 28-year-old female college
graduate from the Midwest. Calculate the expected di¤erence in earnings between Juanita and Jennifer.
The expected di¤erence in earnings between Juanita and Jennifer is:

\ (Juanita)
AHE \ (Jennif er) = ^ 6
AHE ^ =
5 0:27 0:6 = 0:87

because they only di¤er wrt variables South (X6 ) and M idwest (X5 ).

3
3. Data were collected from a random sample of 220 home sales from a community in 2003. Let Price denote the
selling price (in $1000), BDR denote the number of bedrooms. Bath denote the number of bathrooms. Hsize
denote the size of the house (in square feet), Lsize denote the lot size (in square feet), Age denote the age
of the house (in years), and P oor denote a binary variable that is equal to 1 if the condition of the house is
reported as "poor". An estimated regression yields the following results:

\
Price = 119:2 + 0:485BDR + 23:4Bath + 0:156Hsize + 0:002Lsize
+0:090Age 48:8P oor;
2
R = 0:72; SSR = 41:5

(a) Suppose that a homeowner converts part of an existing family room in her house into a new bathroom.
What is the expected increase in the value of the house?
Since Bath is the only variable changing (DCR does not change though one bedroom becomes smaller),

\ = 23:4 Bath = 23:4


Price 1 = 23:4;

i.e. $23,400 because Price is measure in $1000s.


(b) Suppose that a homeowner adds a new bathroom to her house, which increases the size of the house by
100 square feet. What is the expected increase in the value of the house?
In this case Bath = 1 and Hsize = 100; so the resulting expected change in price is 39.0 thousand
dollars or $39,000,

\ = 23:4 Bath + 0:156 Hsize = 23:4


Price 1 + 0:156 100 = 39:0:

(c) What is the loss in value if a homeowner lets his house run down so that its condition becomes "poor"?
The loss is $48,800.
(d) Compute the R2 for the regression and provide an interpretation for its value.
From the text,
n 1
R2 = 1 1 R2
n k(1) 1
so that
n k(1) 1
R2 = 1 1 R2 ;
n 1
and thus
220 6 1
R2 = 1 (1 0:72) = 0:728:
220 1

4
4. Consider the regression model
Yi = 1 X1i + 2 X2i + Ui ;
for i = 1; : : : ; n: (Notice that there is NO a constant term in the regression.).

(a) Specify the least squares function which is minimized by OLS.


n
X 2
Q (b1 ; b2 ) = (Yi b1 X1i b2 X2i ) :
i=1

(b) Calculate the partial derivatives of the objection function with respect to b1 and b2 :
n
X
@
Q (b1 ; b2 ) = 2 (Yi b1 X1i b2 X2i ) X1i
@b1 i=1
Xn
@
Q (b1 ; b2 ) = 2 (Yi b1 X1i b2 X2i ) X2i :
@b2 i=1

Pn Pn Pn
(c) Suppose that i=1 X1i X2i = 0: Show that ^ 1 = i=1 X1i Yi = i=1 X1i
2
:
In this case we have that
n
X
@
Q ^1; ^2 = 2 Yi ^ X1i
1
^ X2i X1i = 0
2
@b1 i=1
n
X
@
Q ^1; ^2 = 2 Yi ^ X1i
1
^ X2i X2i = 0;
2
@b2 i=1

so from the …rst equation


n
X n
X n
X
Yi X1i + ^ 1 2
X1i + ^2 X2i X1i = 0;
i=1 i=1 i=1
Pn
where using i=1 X2i X1i = 0 we obtain
n
X n
X
^ 2
X1i = Yi X1i
1
i=1 i=1

and the result follows.


Pn ^ as a function of the data (Yi ; X1i ; X2i ) ;
(d) Suppose that i=1 X1i X2i 6= 0: Derive an expression for 1
i = 1; : : : ; n:
In this case we have to replace ^ 2 in the …rst equation using the second one, i.e.,
n
X n
X n
X
Yi X2i + ^ 1 X2i X1i + ^ 2 2
X2i = 0;
i=1 i=1 i=1

so that Pn Pn
^ = i=1 Yi X2i ^ 1 i=1 X2i X1i
2 Pn 2 ;
i=1 X2i
so plugging in this expression in the …rst equation we obtain
Pn ! n
Xn Xn ^ Pn X2i X1i X
^ 2 i=1 Yi X2i
Pn 1 i=1
Yi X1i + 1 X1i + 2 X2i X1i = 0:
i=1 i=1 i=1 X2i i=1
Pn 2
Multiplying by i=1 X2i ;
n
X n
X n
X n
X n
X n
X n
X n
X
Yi X1i 2
X2i + ^1 2
X1i 2
X2i + Yi X2i X2i X1i ^
1 X2i X1i X2i X1i = 0
i=1 i=1 i=1 i=1 i=1 i=1 i=1 i=1

5
and grouping terms,
n n n n n n n n
!
X X X X X X X X
Yi X1i 2
X2i Yi X2i X2i X1i = ^1 2
X1i 2
X2i X2i X1i X2i X1i
i=1 i=1 i=1 i=1 i=1 i=1 i=1 i=1

we get Pn Pn Pn Pn
2
^ = i=1 Yi X1i i=1 X2i i=1Yi X2i i=1 X2i X1i
1 Pn P n Pn 2 :
2 2
i=1 X1i i=1 X2i ( i=1 X2i X1i )
(e) Suppose that the model includes an intercept: Yi = 0 + 1 X1i + 2 X2i + Ui : Show that OLS estimators
satisfy ^ 0 = Y ^ X1 ^ X2 :
1 2
In this case we have one extra …rst order condition,
n
X
@
Q ^0; ^1; ^2 = 2 Yi ^
0
^ X1i
1
^ X2i = 0;
2
@b0 i=1

which implies
n
1X ^ ^ X1i ^ X2i = 0
Yi 0 1 2
n i=1
or equivalently,
Y ^ ^ X1 ^ X2 = 0
0 1 2

which implies the result.


(f) Suppose that the model has an intercept as in (e): Suppose moreover that
n
X
X1i X1 X2i X2 = 0:
i=1

Prove that Pn
^ = i=1 X1i X1 Yi Y
1 Pn 2 :
i=1 X1i X1
How does it compare with the OLS estimator of 1 of the regression where X2 is omitted?
The expression is the same as when X2 is omitted. In this case the …rst order conditions for ^ 1 and ^ 2
have to modi…ed to include the intercept,
n
X
@
Q ^0; ^1; ^2 = 2 Yi ^
0
^ X1i
1
^ X2i X1i = 0
2
@b1 i=1
Xn
@
Q ^0; ^1; ^2 = 2 Yi ^
0
^ X1i
1
^ X2i X2i = 0;
2
@b2 i=1

and replacing the expression for ^ 0 ; ^ 0 = Y ^ X1


1
^ X2 ; these conditions are equivalent to
2
n
X
Yi Y + ^ 1 X1 + ^ 2 X2 ^ X1i
1
^ X2i X1i
2 = 0
i=1
Xn
Yi Y + ^ 1 X1 + ^ 2 X2 ^ X1i
1
^ X2i X2i
2 = 0;
i=1

or
n
X
Yi Y ^ X1i X1 ^ X2i X2 X1i = 0
1 2
i=1
Xn
Yi Y ^ X1i X1 ^ X2i X2 X2i = 0;
1 2
i=1

6
so taking the …rst condition we have
n
X n
X n
X
Yi Y X1i ^ X1i X1 X1i ^ X2i X2 X1i = 0
1 2
i=1 i=1 i=1

or
n
X n
X n
X
^ 2 ^
Yi Y X1i X1 1 X1i X1 2 X2i X2 X1i X1 = 0
i=1 i=1 i=1
Pn Pn Pn
because, for example, i=1 X2i X2 X1i = i=1 X2i X2 X1i X1 as X1 X2i X2 = 0:
Pn i=1
Finally, use that i=1 X2i X2 X1i X1 = 0 to show that
n
X n
X
^ 2
Yi Y X1i X1 1 X1i X1 =0
i=1 i=1

and obtain the desired result.

7
5. Using the data set TeachingRatings, carry out the following exercises:

(a) Run a regression of Course_Eval (remind they are test scores) on Beauty (it measures professor’s beauty).
What is the estimated slope?
The estimated slope is 0.133.
(b) Run a regression of Course_Eval on Beauty including some additional variables to control for the type
of course and professor characteristics. In particular, include as additional regressors Intro, OneCredit,
F emale, M inority, and N N _English. What is the estimated e¤ect of Beauty on Course_Eval? Does
the regression in (a) su¤er from important omitted variable bias?
The estimated slope is 0.166. The coe¢ cient does not change by a large amount. Thus, there does not
appear to be a very large omitted variable bias.
The regressions are:
Model
Regressor (a) (b)
Beauty 0.133 0.166
Intro 0.011
OneCredit 0.634
Female -0.173
Minority -0.167
NNEnglish -0.244
Intercept 4.00 4.07
SER 0.545 0.513
R2 0.036 0.155
(c) Estimate the coe¢ cient of Beauty variable of the multiple regression model in (b) through the process in
three stages or Frisch-Waugh Theorem:
1. Regress the dependent variable Course_Eval on the additional controls and get residuals Y~ .
~
2. Regress the explanatory variable Beauty on the additional controls and get residuals X.
3. Regress the residuals Y~ on the residuals X;
~
and check that you get the same estimated coe¢ cient for Beauty that the one obtained in (b):
The …rst and second steps provide
Dependent Variable
Regressor Beauty Course_eval
Intro 0.12 0.03
OneCredit -0.37 0.57
Female 0.19 -0.14
Minority 0.08 -0.15
NNEnglish 0.02 -0.24
Intercept -0.11 4.05
Regressing the residual from step 2 onto the residual from step 1 yield a coe¢ cient on Beauty that is equal
to 0.166 (as in (b)).
(d) Professor Smith is a black male with average Beauty and is a native English speaker. He teaches a
three-credit upper-division course. Predict Professor Smith’s course evaluation.
Professor Smith’s predicted course evaluation = (0:166 0) + (0:011 0) + (0:634 0) (0:173 0)
(0:167 1) (0:244 0) + 4:068 = 3:901:

8
ANSWERS:

2. a) 0.175, 0.189 and 0.193 respectively; b) 5.46 $/hour more by average; c) 2.64$/hour more by average; e) 15:67
and 17:12 respectively; h) -0.87.
3. a) 23; 400$; b) 39; 000$; c) 48; 800$; d) 0:727:
5. a) 0.133. b) 0.166 the coe¢ cient does not change so much and the e¤ect does not seem to be very large; d) 3.901.

You might also like