0% found this document useful (0 votes)
554 views16 pages

Themultipleregressionmodel: I I1 I2 I3

This document provides exercises related to multiple regression analysis. It includes examples of estimating regression models using provided data sets and interpreting the results. Some exercises ask to conduct statistical tests of hypotheses about the regression coefficients. The exercises cover topics such as calculating residuals, variance estimates, correlation coefficients, confidence intervals, tests of individual coefficients, and tests of joint hypotheses.

Uploaded by

Mon Luffy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
554 views16 pages

Themultipleregressionmodel: I I1 I2 I3

This document provides exercises related to multiple regression analysis. It includes examples of estimating regression models using provided data sets and interpreting the results. Some exercises ask to conduct statistical tests of hypotheses about the regression coefficients. The exercises cover topics such as calculating residuals, variance estimates, correlation coefficients, confidence intervals, tests of individual coefficients, and tests of joint hypotheses.

Uploaded by

Mon Luffy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

200 THE MULTIPLE REGRESSION MODEL

Ta b l e 5 . 5 Data for Exercise 5.1


yi xi1 xi2 xi3

1 1 0 1
2 1 1 2
3 1 2 1
1 1 2 0
0 1 1 1
1 1 2 1
2 1 0 1
1 1 1 1
2 1 1 0

(d) Find the least squares residuals ^e1 , ^e2 , . . . , ^e9 .


(e) Find the variance estimate s^2.
(f) Use (5.9) to find the sample correlation between x2 and x3 .
(g) Find the standard error for b2.
(h) Find SSE, SST, SSR, and R2 .
5.2* Use your answers to Exercise 5.1 to
(a) Compute a 95% interval estimate for b2
(b) Test the hypothesis H0 : b2 ¼ 1 against the alternative that H1 : b2 6¼ 1
5.3 Consider the following model that relates the proportion of a household’s budget
spent on alcohol WALC to total expenditure TOTEXP, age of the household head
AGE, and the number of children in the household NK.

WALC ¼ b1 þ b2 lnðTOTEXPÞ þ b3 AGE þ b4 NK þ e

The data in the file london.dat were used to estimate this model. See Exercise 4.10 for
more details about the data. Note that only households with one or two children are
being considered. Thus, NK takes only the values one or two. Output from estimating
this equation appears in Table 5.6.

Ta b l e 5 . 6 Output for Exercise 5.3

Dependent Variable: WALC


Included observations: 1519

Variable Coefficient Std. Error t-Statistic Prob.

C 0.0091 0.0190 0.6347


ln(TOTEXP) 0.0276 6.6086 0.0000
AGE 0.0002 6.9624 0.0000
NK 0.0133 0.0033 4.0750 0.0000

R-squared Mean dependent var 0.0606


S.E. of regression S.D. dependent var 0.0633
Sum squared resid 5.752896
5.9 EXERCISES 201

(a) Fill in the following blank spaces that appear in this table.
(i) The t-statistic for b1
(ii) The standard error for b2
(iii) The estimate b3
(iv) R2
(v) s^
(b) Interpret each of the estimates b2 , b3 , and b4 .
(c) Compute a 95% interval estimate for b3. What does this interval tell you?
(d) Test the hypothesis that the budget proportion for alcohol does not depend on the
number of children in the household. Can you suggest a reason for the test outcome?
5.4* The data set used in Exercise 5.3 is used again. This time it is used to estimate how the
proportion of the household budget spent on transportation WTRANS depends on
the log of total expenditure ln(TOTEXP), AGE, and number of children NK. The
output is reported in Table 5.7.
(a) Write out the estimated equation in the standard reporting format with standard
errors below the coefficient estimates.
(b) Interpret the estimates b2 , b3 , and b4 . Do you think the results make sense from an
economic or logical point of view?
(c) Are there any variables that you might exclude from the equation? Why?
(d) What proportion of variation in the budget proportion allocated to transport is
explained by this equation?
(e) Predict the proportion of a budget that will be spent on transportation, for both
one- and two-children households, when total expenditure and age are set at their
sample means, which are 98.7 and 36, respectively.
5.5 This question is concerned with the value of houses in towns surrounding Boston. It
uses the data of Harrison, D., and D. L. Rubinfeld (1978), ‘‘Hedonic Prices and the
Demand for Clean Air,’’ Journal of Environmental Economics and Management, 5,
81–102. The output appears in Table 5.8. The variables are defined as follows:
VALUE ¼ median value of owner-occupied homes in thousands of dollars
CRIME ¼ per capita crime rate
NITOX ¼ nitric oxide concentration (parts per million)
ROOMS ¼ average number of rooms per dwelling
AGE ¼ proportion of owner-occupied units built prior to 1940

Ta b l e 5 . 7 Output for Exercise 5.4

Dependent Variable: WTRANS


Included observations: 1519

Variable Coefficient Std. Error t-Statistic Prob.

C 0.0315 0.0322 0.9776 0.3284


ln(TOTEXP) 0.0414 0.0071 5.8561 0.0000
AGE 0.0001 0.0004 0.1650 0.8690
NK 0.0130 0.0055 2.3542 0.0187

R-squared 0.0247 Mean dependent var 0.1323


S.D. dependent var 0.1053
202 THE MULTIPLE REGRESSION MODEL

Ta b l e 5 . 8 Output for Exercise 5.5


Dependent Variable: VALUE
Included observations: 506

Variable Coefficient Std. Error t-Statistic Prob.

C 28.4067 5.3659 5.2939 0.0000


CRIME 0.1834 0.0365 5.0275 0.0000
NITOX 22.8109 4.1607 5.4824 0.0000
ROOMS 6.3715 0.3924 16.2378 0.0000
AGE 0.0478 0.0141 3.3861 0.0008
DIST 1.3353 0.2001 6.6714 0.0000
ACCESS 0.2723 0.0723 3.7673 0.0002
TAX 0.0126 0.0038 3.3399 0.0009
PTRATIO 1.1768 0.1394 8.4409 0.0000

DIST ¼ weighted distances to five Boston employment centers


ACCESS ¼ index of accessibility to radial highways
TAX ¼ full-value property-tax rate per $10,000
PTRATIO ¼ pupil–teacher ratio by town
(a) Report briefly on how each of the variables influences the value of a home.
(b) Find 95% interval estimates for the coefficients of CRIME and ACCESS.
(c) Test the hypothesis that increasing the number of rooms by one increases the
value of a house by $7,000.
(d) Test as an alternative hypothesis H1 that reducing the pupil–teacher ratio by 10
will increase the value of a house by more than $10,000.
5.6 Suppose that from a sample of 63 observations, the least squares estimates and the
corresponding estimated covariance matrix are given by
2 3 2 3 2 3
b1 2 3 2 1
4 b2 5 ¼ 4 3 5 ; b
cov ðbÞ ¼ 4 2 4 05
b3 1 1 0 3

Test each of the following hypotheses and state the conclusion:


(a) b2 ¼ 0
(b) b1 þ 2b2 ¼ 5
(c) b1  b2 þ b3 ¼ 4
5.7 What are the standard errors of the least squares estimates b2 and b3 in the regres-
sion model y ¼ b1 þ b2 x2 þ b3 x3 þ e where N ¼ 202, SSE ¼ 11.12389, r23 ¼
0.114255, åNi¼1 ðxi2  x2 Þ2 ¼ 1210:178, and åNi¼1 ðxi3  x3 Þ2 ¼ 30307:57?
5.8* An agricultural economist carries out an experiment to study the production
relationship between the dependent variable YIELD ¼ peanut yield (pounds per
acre) and the production inputs
NITRO ¼ amount of nitrogen applied (hundreds of pounds per acre)
PHOS ¼ amount of phosphorus fertilizer (hundreds of pounds per acre)
5.9 EXERCISES 203

A total N ¼ 27 observations were obtained using different test fields. The estimated
quadratic model, with an interaction term, is
b ¼ 1:385 þ 8:011NITRO þ 4:800PHOS  1:944NITRO2
YIELD
 0:778PHOS 2  0:567NITRO  PHOS

(a) Find equations describing the marginal effect of nitrogen on yield and the
marginal effect of phosporus on yield. What do these equations tell you?
(b) What are the marginal effects of nitrogen and of phosphorus when (i) NITRO and
PHOS ¼ 1 and (ii) when NITRO ¼ 2 and PHOS ¼ 2? Comment on your
findings.
(c) Test the hypothesis that the marginal effect of nitrogen is zero, when
(iv) PHOS ¼ 1 and NITRO ¼ 1
(v) PHOS ¼ 1 and NITRO ¼ 2
(vi) PHOS ¼ 1 and NITRO ¼ 3
Note: The following information may be useful:

b
varðb2 þ 2b4 þ b6 Þ ¼ 0:233
b
varðb þ 4b þ b Þ ¼ 0:040
2 4 6
b
varðb2 þ 6b4 þ b6 Þ ¼ 0:233

(d) ^[This part requires the use of calculus] For the function estimated, what levels
of nitrogen and phosphorus give maximum yield? Are these levels the optimal
fertilizer applications for the peanut producer?
5.9 When estimating wage equations, we expect that young, inexperienced workers will
have relatively low wages and that with additional experience their wages will rise,
but then begin to decline after middle age, as the worker nears retirement. This life-
cycle pattern of wages can be captured by introducing experience and experience
squared to explain the level of wages. If we also include years of education, we have
the equation

WAGE ¼ b1 þ b2 EDUC þ b3 EXPER þ b4 EXPER2 þ e

(a) What is the marginal effect of experience on wages?


(b) What signs do you expect for each of the coefficients b2, b3, and b4? Why?
(c) After how many years of experience do wages start to decline? (Express your
answer in terms of b’s.)
(d) The results from estimating the equation using 1000 observations in the file
cps4c_small.dat are given in Table 5.9 on page 204. Find 95% interval estimates for
(i) The marginal effect of education on wages
(ii) The marginal effect of experience on wages when EXPER ¼ 4
(iii) The marginal effect of experience on wages when EXPER ¼ 25
(iv) The number of years of experience after which wages decline

5.9.2 COMPUTER EXERCISES

5.10 Use a computer to verify your answers to Exercise 5.1, parts (c), (e), (f), (g), and (h).
5.9 EXERCISES 205

(e) Test the hypothesis that the quality of cocaine has no influence on price against
the alternative that a premium is paid for better-quality cocaine.
(f) What is the average annual change in the cocaine price? Can you suggest why
price might be changing in this direction?
5.13 The file br2.dat contains data on 1,080 houses sold in Baton Rouge, Louisiana,
during mid-2005. We will be concerned with the selling price (PRICE), the size of the
house in square feet (SQFT), and the age of the house in years (AGE).
(a) Use all observations to estimate the following regression model and report the
results

PRICE ¼ b1 þ b2 SQFT þ b3 AGE þ e

(i) Interpret the coefficient estimates.


(ii) Find a 95% interval estimate for the price increase for an extra square foot
of living space—that is, @PRICE=@SQFT.
(iii) Test the hypothesis that having a house a year older decreases price by
1000 or less ðH0 : b3 1000Þ against the alternative that it decreases
price by more than 1000 ðH1 : b3 < 1000Þ.
(b) Add the variables SQFT 2 and AGE 2 to the model in part (a) and re-estimate the
equation. Report the results.
(i) Find estimates of the marginal effect @PRICE=@SQFT for the smallest
house in the sample, the largest house in the sample, and a house with 2300
SQFT. Comment on these values. Are they realistic?
(ii) Find estimates of the marginal effect @PRICE=@AGE for the oldest house
in the sample, the newest house in the sample, and a house that is 20 years
old. Comment on these values. Are they realistic?
(iii) Find a 95% interval estimate for the marginal effect @PRICE=@SQFT for a
house with 2300 square feet.
(iv) For a house that is 20 years old, test the hypothesis

@PRICE @PRICE
H0 : 1000 against H1 : < 1000
@AGE @AGE

(c) Add the interaction variable SQFT  AGE to the model in part (b) and re-
estimate the equation. Report the results. Repeat parts (i), (ii), (iii), and (iv) from
part (b) for this new model. Use SQFT ¼ 2300 and AGE ¼ 20.
(d) From your answers to parts (a), (b), and (c), comment on the sensitivity of the
results to the model specification.
5.14 The file br2.dat contains data on 1,080 houses sold in Baton Rouge, Louisiana,
during mid-2005. We will be concerned with the selling price (PRICE), the size of the
house in square feet (SQFT), and the age of the house in years (AGE). Define a new
variable that measures house size in terms of hundreds of square feet,
SQFT100 ¼ SQFT=100.
(a) Estimate the following equation and report the results:

lnðPRICEÞ ¼ a1 þ a2 SQFT100 þ a3 AGE þ a4 AGE2 þ e

(b) Interpret the estimate for a2.


(c) Find and interpret estimates for @ lnðPRICEÞ=@AGE when AGE ¼ 5 and
AGE ¼ 20.
206 THE MULTIPLE REGRESSION MODEL

(d) Find expressions for @PRICE=@AGE and @PRICE=@SQFT100. (Ignore the


error term.)
(e) Estimate @PRICE=@AGE and @PRICE=@SQFT100 for a 20-year-old house with
a living area of 2300 square feet.
(f) Find the standard errors of your estimates in (e).
(g) Find a 95% interval estimate for the marginal effect @PRICE=@SQFT100 for a
20-year-old house with 2300 square feet.
(h) For a 20-year-old house with 2300 square feet, test the hypothesis

@PRICE @PRICE
H0 : 1000 against H1 : < 1000
@AGE @AGE

5.15* Reconsider the presidential voting data (fair4.dat) introduced in Exercise 2.14.
(a) Estimate the regression model

VOTE ¼ b1 þ b2 GROWTH þ b3 INFLATION þ e

Report the results in standard format. Are the estimates for b2 and b3 significantly
different from zero at a 10% significance level? Did you use one-tail tests or two-
tail tests? Why?
(b) Assume the inflation rate is 4%. Predict the percentage vote for the incumbent
party when the growth rate is (i) 3%, (ii) 0%, and (iii) 3%.
(c) Test, as an alternative hypothesis, that the incumbent party will get the majority
of the expected vote when the growth rate is (i) 3%, (ii) 0%, and (iii) 3%. Use a
1% level of significance. If you were the president seeking re-election, why
might you set up each of these hypotheses as an alternative rather than a null
hypothesis?
5.16 Data on the weekly sales of a major brand of canned tuna by a supermarket chain in a
large midwestern U.S. city during a mid-1990’s calendar year are contained in the file
tuna.dat. There are 52 observations on the variables. SAL1 ¼ unit sales of brand no.
1 canned tuna; APR1 ¼ price per can of brand no. 1 canned tuna; APR2, APR3 ¼
price per can of brands no. 2 and 3 of canned tuna.
(a) The prices APR1, APR2, and APR3 are expressed in dollars. Multiply the
observations on each of these variables by 100 to express them in terms of
cents; call the new variables PR1, PR2, and PR3. Estimate the following
regression model and report the results:
SAL1 ¼ b1 þ b2 PR1 þ b3 PR2 þ b4 PR3 þ e

(b) Interpret the estimates b2, b3, and b4. Do they have the expected signs?
(c) Using suitable one-tail tests and a 5% significance level, test whether each of the
coefficients b2, b3, and b4 are significantly different from zero.
(d) Using a 5% significance level, test the following hypotheses:
(i) A 1-cent increase in the price of brand one reduces its sales by 300 cans.
(ii) A 1-cent increase in the price of brand two increases the sales of brand one
by 300 cans.
(iii) A 1-cent increase in the price of brand three increases the sales of brand
one by 300 cans.
5.9 EXERCISES 207

(iv) The effect of a price increase in brand two on sales of brand one is the
same as the effect of a price increase in brand three on sales of brand
one. Does the outcome of this test contradict your findings from parts (ii)
and (iii)?
(v) If prices of all 3 brands go up by 1 cent, there is no change in sales.

5.17 (a) Reconsider the model SAL1 ¼ b1 þ b2 PR1 þ b3 PR2 þ b4 PR3 þ e from Exer-
cise 5.16. Estimate this model if you have not already done so, and find a 95%
interval estimate for expected sales when PR1 ¼ 90; PR2 ¼ 75, and PR3 ¼ 75.
What is wrong with this interval?
(b) Estimate the alternative model lnðSAL1Þ ¼ a1 þ a2 PR1 þ a3 PR2 þ a4 PR3 þ e,
and find a 95% interval estimate for expected log of sales when
PR1 ¼ 90; PR2 ¼ 75, and PR3 ¼ 75. Convert this interval into one for sales,
and compare it with what you got in part (a).
(c) How does the interpretation of the coefficients in the model with ln(SAL1) as the
dependent variable differ from that for the coefficients in the model with SAL1 as
the dependent variable?
5.18 What is the relationship between crime and punishment? This important question has
been examined by Cornwell and Trumbull4 using a panel of data from North
Carolina. The cross sections are 90 counties, and the data are annual for the years
1981–1987. The data are in the file crime.dat.
Using the data from 1987, estimate a regression relating the log of the crime rate
LCRMRTE to the probability of an arrest PRBARR (the ratio of arrests to offenses),
the probability of conviction PRBCONV (the ratio of convictions to arrests), the
probability of a prison sentence PRBPRIS (the ratio of prison sentences to convic-
tions), the number of police per capita POLPC, and the weekly wage in construction
WCON. Write a report of your findings. In your report, explain what effect you would
expect each of the variables to have on the crime rate and note whether the estimated
coefficients have the expected signs and are significantly different from zero. What
variables appear to be the most important for crime deterrence? Can you explain the
sign for the coefficient of POLPC?
5.19 Use the data in cps4_small.dat to estimate the following wage equation

lnðWAGEÞ ¼ b1 þ b2 EDUC þ b3 EXPER þ b4 HRSWK þ e

(a) Report the results. Interpret the estimates for b2, b3, and b4. Are these estimates
significantly different from zero?
(b) Test the hypothesis that an extra year of education increases the wage rate by at
least 10% against the alternative that it is less than 10%.
(c) Find a 90% interval estimate for the percentage increase in wage from working
an additional hour per week.
(d) Re-estimate the model with the additional variables EDUC  EXPER, EDUC 2,
and EXPER2. Report the results. Are the estimated coefficients significantly
different from zero?
(e) For the new model, find expressions for the marginal effects @ lnðWAGEÞ=
@EDUC and @ lnðWAGEÞ=@EXPER:

4
‘‘Estimating the Economic Model of Crime with Panel Data,’’ Review of Economics and Statistics, 76, 1994,
360–366. The data was kindly provided by the authors.
210 THE MULTIPLE REGRESSION MODEL

(c) Find a 95% interval estimate for the elasticity of production with respect to
fertilizer. Has this elasticity been precisely measured?
(d) Using a 5% level of significance, test the hypothesis that the elasticity of
production with respect to labor is less than or equal to 0.3 against the alter-
native that it is greater than 0.3. What happens if you reverse the null and
alternative hypotheses?
5.25 Consider the following aggregate production function for the U.S. manufacturing
sector:

Y ¼ aK b2 Lb3 Eb4 M b5 expfeg

where Y is gross output, K is capital, L is labor, E is energy, and M denotes other


intermediate materials. The data underlying these variables are given in index form in
the file manuf.dat.
(a) Show that taking logarithms of the production function puts it in a form suitable
for least squares estimation.
(b) Estimate the unknown parameters of the production function and find the
corresponding standard errors.
(c) Discuss the economic and statistical implications of these results.

Appendix 5A Derivation of Least Squares Estimators


In Appendix 2A we derived expressions for the least squares estimators b1 and b2 in the
simple regression model. In this appendix we proceed with a similar exercise for the multiple
regression model; we describe how to obtain expressions for b1 , b2, and b3 in a model with
two explanatory variables. Given sample observations on y, x2 , and x3 , the problem is to find
values for b1 , b2, and b3 that minimize

N
Sðb1 ; b2 ; b3 Þ ¼ å ðyi  b1  b2 xi2  b3 xi3 Þ2
i¼1

The first step is to partially differentiate S with respect to b1 , b2 , and b3 and to set the first-
order partial derivatives to zero. This yields

qS
¼ 2Nb1 þ 2b2 åxi2 þ 2b3 åxi3  2åyi
qb1
qS
¼ 2b1 åxi2 þ 2b2 åx2i2 þ 2b3 åxi2 xi3  2åxi2 yi
qb2
qS
¼ 2b1 åxi3 þ 2b2 åxi2 xi3 þ 2b3 åx2i3  2åxi3 yi
qb3

Setting these partial derivatives equal to zero, dividing by 2, and rearranging yields

Nb1 þ åxi2 b2 þ åxi3 b3 ¼ åyi


åxi2 b1 þ åx2i2 b2 þ åxi2 xi3 b3 ¼ åxi2 yi (5A.1)
åxi3 b1 þ åxi2 xi3 b2 þ åx2i3 b3 ¼ åxi3 yi
246 FURTHER INFERENCE IN THE MULTIPLE REGRESSION MODEL

With 95% confidence we estimate that average sales over many weeks will lie between
$75,144 and $78,804, but in any single week we forecast sales will be between $67,533 and
$86,415.

6.6 Exercises
Answers to exercises marked * appear at www.wiley.com/college/hill.
6.6.1 PROBLEMS

6.1 When using N ¼ 40 observations to estimate the model


y ¼ b1 þ b 2 x þ b 3 z þ e
you obtain SSE ¼ 979:830 and sy ¼ 13:45222. Find
(a) R2
(b) The value of the F-statistic for testing H0 : b2 ¼ b3 ¼ 0 (Do you reject or fail to
reject H0?)
6.2 Consider again the model in Exercise 6.1. After augmenting this model with the
squares and cubes of predictions ^y2 and ^y3 , we obtain SSE ¼ 696:5357. Use RESET
to test for misspecification.
6.3* Consider the model
y ¼ b1 þ x2 b2 þ x3 b3 þ e
and suppose that applicationof least squares to 20 observations on these variables

b
yields the following results cov (b) denotes the estimated covariance matrix :
2 3 2 3 2 3
b1 0:96587 0:21812 0:019195 0:050301
4 b2 5 ¼ 4 0:69914 5; b covðbÞ ¼ 4 0:019195 0:048526 0:031223 5
b3 1:7769 0:050301 0:031223 0:037120
^ 2 ¼ 2:5193
s R2 ¼ 0:9466
(a) Find the total variation, unexplained variation, and explained variation for this model.
(b) Find 95% interval estimates for b2 and b3.
(c) Use a t-test to test the hypothesis H0 : b2  1 against the alternative H1 : b2 < 1.
(d) Use your answers in part (a) to test the joint hypothesis H0 : b2 ¼ 0; b3 ¼ 0.
(e) Test the hypothesis H0 : 2b2 ¼ b3 .
6.4 Consider the wage equation

lnðWAGEÞ ¼ b1 þ b2 EDUC þ b3 EDUC 2 þ b4 EXPER þ b5 EXPER2


þ b6 ðEDUC  EXPERÞ þ b7 HRSWK þ e

where the explanatory variables are years of education, years of experience and hours
worked per week. Estimation results for this equation, and for modified versions of it
obtained by dropping some of the variables, are displayed in Table 6.4. These results
are from the 1000 observations in the file cps4c_small.dat.
(a) Using an approximate 5% critical value of tc ¼ 2, what coefficient estimates are
not significantly different from zero?
(b) What restriction on the coefficients of Eqn (A) gives Eqn (B)? Use an F-test to
test this restriction. Show how the same result can be obtained using a t-test.
6.6 EXERCISES 247

(c) What restrictions on the coefficients of Eqn (A) give Eqn (C)? Use an F-test to
test these restrictions. What question would you be trying to answer by
performing this test?
(d) What restrictions on the coefficients of Eqn (B) give Eqn (D)? Use an F-test to
test these restrictions. What question would you be trying to answer by
performing this test?
(e) What restrictions on the coefficients of Eqn (A) give Eqn (E)? Use an F-test to
test these restrictions. What question would you be trying to answer by
performing this test?
(f) Based on your answers to parts (a) to (e), which model would you prefer? Why?
(g) Compute the missing AIC value for Eqn (D) and the missing SC value for Eqn
(A). Which model is favored by the AIC? Which model is favored by the SC?
6.5* Consider the wage equation

lnðWAGEÞ ¼ b1 þ b2 EDUC þ b3 EDUC2 þ b4 EXPER þ b5 EXPER2


þ b6 HRSWK þ e

(a) Suppose you wish to test the hypothesis that a year of education has the same
effect on ln (WAGE) as a year of experience. What null and alternative hypoth-
eses would you set up?
(b) What is the restricted model, assuming that the null hypothesis is true?
(c) Given that the sum of squared errors from the restricted model is SSER ¼
254.1726, test the hypothesis in (a). (For SSEU use the relevant value from
Table 6.4. The sample size is N ¼ 1,000.)

Ta b l e 6 . 4 Wage Equation Estimates for Exercises 6.4 and 6.5


Variable Coefficient Estimates and (Standard Errors)

Eqn (A) Eqn (B) Eqn (C) Eqn (D) Eqn (E)

C 1.055 1.252 1.573 1.917 0.904


(0.266) (0.190) (0.188) (0.080) (0.096)
EDUC 0.0498 0.0289 0.0366 0.1006
(0.0397) (0.0344) (0.0350) (0.0063)
EDUC 2 0.00319 0.00352 0.00293
(0.00169) (0.00166) (0.00170)
EXPER 0.0373 0.0303 0.0279 0.0295
(0.0081) (0.0048) (0.0054) (0.0048)
EXPER2 0.000485 0.000456 0.000470 0.000440
(0.000090) (0.000086) (0.000096) (0.000086)
EXPER  EDUC 0.000510
(0.000482)
HRSWK 0.01145 0.01156 0.01345 0.01524 0.01188
(0.00137) (0.00137) (0.00136) (0.00151) (0.00136)

SSE 222.4166 222.6674 233.8317 280.5061 223.6716


AIC 1.489 1.490 1.445 1.488
SC 1.461 1.426 1.244 1.463
248 FURTHER INFERENCE IN THE MULTIPLE REGRESSION MODEL

6.6 RESET suggests augmenting an existing model with the squares of the predictions
^y2 , or with their squares and cubes (^y2 ; ^y3 ). What would happen if you augmented
the model with the predictions themselves ^y?
6.7 Table 6.5 contains output for the two models
y ¼ b1 þ b2 x þ b3 w þ e
y ¼ b 1 þ b2 x þ e
obtained using N ¼ 35 observations. RESET applied to the second model yields
F-values of 17.98 (for ^y2 ) and 8.72 (for ^y2 and ^y3 ). The correlation between x and w is
rxw ¼ 0:975. Discuss the following questions:
(a) Should w be included in the model?
(b) What can you say about omitted-variable bias?
(c) What can you say about the existence of collinearity and its possible effect?
6.8 In Section 6.1.5 we tested the joint null hypothesis

H0 : b3 þ 3:8 b4 ¼ 1 and b1 þ 6 b2 þ 1:9 b3 þ 3:61 b4 ¼ 80

in the model

SALES ¼ b1 þ b2 PRICE þ b3 ADVERT þ b4 ADVERT 2 þ e

By substituting the restrictions into the model and rearranging variables, show how
the model can be written in a form in which least squares estimation will yield
restricted least squares estimates.

6.6.2 COMPUTER EXERCISES

6.9 In Exercise 5.25 we expressed the model

Y ¼ aK b2 Lb3 Eb4 M b5 expfeg

in terms of logarithms and estimated it using data in the file manuf.dat. Use the data
and results from Exercise 5.25 to test the following hypotheses:
(a) H0 : b2 ¼ 0 against H1 : b2 6¼ 0:
(b) H0 : b2 ¼ 0, b3 ¼ 0 against H1 : b2 6¼ 0 and/or b3 6¼ 0.
(c) H0 : b2 ¼ 0, b4 ¼ 0 against H1 : b2 6¼ 0 and/or b4 6¼ 0:
(d) H0 : b2 ¼ 0, b3 ¼ 0; b4 ¼ 0 against H1 : b2 6¼ 0 and/or b3 6¼ 0 and/or b4 6¼ 0.
(e) H0 : b2 þ b3 þ b4 þ b5 ¼ 1 against H1 : b2 þ b3 þ b4 þ b5 6¼ 1.
(f) Analyze the impact of collinearity on this model.

Ta b l e 6 . 5 Output for Exercise 6.7


Variable Coefficient Std. Error t-value Coefficient Std. Error t-value

C 3.6356 2.763 1.316 5.8382 2.000 2.919


X 0.99845 1.235 0.8085 4.1072 0.3383 12.14
W 0.49785 0.1174 4.240
6.6 EXERCISES 249

6.10* Use the sample data for beer consumption in the file beer.dat to
(a) Estimate the coefficients of the demand relation (6.14) using only sample
information. Compare and contrast these results to the restricted coefficient
results given in (6.19).
(b) Does collinearity appear to be a problem?
(c) Test the validity of the restriction that implies that demand will not change if
prices and income go up in the same proportion.
(d) Use model (6.19) to construct a 95% prediction interval for Q when
PB ¼ 3:00, PL ¼ 10, PR ¼ 2:00, and I ¼ 50000. (Hint: Construct the interval
for ln(Q) and then take antilogs.)
(e) Repeat part (d) using the unconstrained model from part (a). Comment.
6.11 Consider production functions of the form Q ¼ f (L, K), where Q is the output measure
and L and K are labor and capital inputs, respectively. A popular functional form is the
Cobb–Douglas equation
lnðQÞ ¼ b1 þ b2 lnðLÞ þ b3 lnðKÞ þ e

(a) Use the data in the file cobb.dat to estimate the Cobb–Douglas production
function. Is there evidence of collinearity?
(b) Re-estimate the model with the restriction of constant returns to scale—that is,
b2 þ b3 ¼ 1—and comment on the results.
6.12* Using data in the file beer.dat, apply RESET to the two alternative models
lnðQÞ ¼ b1 þ b2 lnðPBÞ þ b3 lnðPLÞ þ b4 lnðPRÞ þ b5 lnðIÞ þ e
Q ¼ b1 þ b2 PB þ b3 PL þ b4 PR þ b5 I þ e
Which model seems to better reflect the demand for beer?
6.13 The file toodyay.dat contains 48 annual observations on a number of variables related
to wheat yield in the Toodyay Shire of Western Australia, for the period 1950– 1997.
Those variables are
Y ¼ wheat yield in tonnes per hectare,
t ¼ trend term to allow for technological change,
RG ¼ rainfall at germination (May–June),
RD ¼ rainfall at development stage (July–August), and
RF ¼ rainfall at flowering (September –October).
The unit of measurement for rainfall is centimeters. A model that allows for the yield
response to rainfall to be different for the three different periods is
Y ¼ b1 þ b2 t þ b3 RG þ b4 RD þ b5 RF þ e

(a) Estimate this model. Report the results and comment on the signs and signifi-
cance of the estimated coefficients.
(b) Test the hypothesis that the response of yield to rainfall is the same irrespective of
whether the rain falls during germination, development, or flowering.
(c) Estimate the model under the restriction that the three responses to rainfall are the
same. Comment on the results.
6.14 Following on from the example in Section 6.3, the file hwage.dat contains another
subset of the data used by labor economist Tom Mroz. The variables with which we
are concerned are
252 FURTHER INFERENCE IN THE MULTIPLE REGRESSION MODEL

6.20* Reconsider the production function for rice estimated in Exercise 5.24 using data in
the file rice.dat:

lnðPRODÞ ¼ b1 þ b2 lnðAREAÞ þ b3 lnðLABORÞ þ b4 lnðFERTÞ þ e


(a) Using a 5% level of significance, test the hypothesis that the elasticity of
production with respect to land is equal to the elasticity of production with
respect to labor.
(b) Using a 10% level of significance, test the hypothesis that the production
function exhibits constant returns to scale—that is, H0:b2 þ b3 þ b4 ¼ 1.
(c) Using a 5% level of significance, jointly test the two hypotheses in parts (a) and
(b)—that is, H0:b2 ¼ b3 and b2 þ b3 þ b4 ¼ 1.
(d) Find restricted least squares estimates for each of the restricted models implied
by the null hypotheses in parts (a), (b) and (c). Compare the different estimates
and their standard errors.
6.21* Re-estimate the model in Exercise 6.20 with (i) FERT omitted, (ii) LABOR omitted,
and (iii) AREA omitted. In each case, discuss the effect of omitting a variable on the
estimates of the remaining two elasticities. Also, in each case, check to see if RESET
has picked up the omitted variable.

6.22* In Chapter 5.7 we used the data in file pizza4.dat to estimate the model

PIZZA ¼ b1 þ b2 AGE þ b3 INCOME þ b4 ðAGE  INCOMEÞ þ e

(a) Test the hypothesis that age does not affect pizza expenditure—that is, test the
joint hypothesis H0:b2 ¼ 0, b4 ¼ 0. What do you conclude?
(b) Construct point estimates and 95% interval estimates of the marginal propensity
to spend on pizza for individuals of ages 20, 30, 40, 50, and 55. Comment on
these estimates.
(c) Modify the equation to permit a ‘‘life-cycle’’ effect in which the marginal effect
of income on pizza expenditure increases with age, up to a point, and then falls.
Do so by adding the term (AGE2  INC) to the model. What sign do you
anticipate on this term? Estimate the model and test the significance of the
coefficient for this variable. Did the estimate have the expected sign?
(d) Using the model in (c), construct point estimates and 95% interval estimates of
the marginal propensity to spend on pizza for individuals of ages 20, 30, 40, 50
and 55. Comment on these estimates. In light of these values, and of the range of
age in the sample data, what can you say about the quadratic function of age that
describes the marginal propensity to spend on pizza?
(e) Forthe modelinpart(c),are eachofthe coefficient estimates for AGE, (AGE INC)
and(AGE2  INC)significantlydifferentfromzeroata 5%significancelevel?Carry
out a joint test for the significance of these variables. Comment on your results.
(f) Check the model used in part (c) for collinearity. Add the term (AGE3  INC) to
the model in (c) and check the resulting model for collinearity.
6.23 Use the data in cps4_small.dat to estimate the following wage equation:

lnðWAGEÞ ¼ b1 þ b2 EDUC þ b3 EDUC 2 þ b4 EXPER


þ b5 EXPER2 þ b6 ðEDUC  EXPERÞ þ e
288 USING INDICATOR VARIABLES

Emergency Room Cases RegressionModel 1

Variable Coefficient Std. Error t-Statistic Prob.

C 93.6958 1.5592 60.0938 0.0000


T 0.0338 0.0111 3.0580 0.0025
HOLIDAY 13.8629 6.4452 2.1509 0.0326
FRIDAY 6.9098 2.1113 3.2727 0.0012
SATURDAY 10.5894 2.1184 4.9987 0.0000
FULLMOON 2.4545 3.9809 0.6166 0.5382
NEWMOON 6.4059 4.2569 1.5048 0.1338

R2 ¼ 0.1736 SSE ¼ 27108.82

(a) Interpret these regression results. When should emergency rooms expect more
calls?
(b) The model was reestimated omitting the variables FULLMOON and NEW-
MOON, as shown below. Comment on any changes you observe.
(c) Test the joint significance of FULLMOON and NEWMOON. State the null and
alternative hypotheses and indicate the test statistic you use. What do you
conclude?
Emergency Room Cases RegressionModel 2

Variable Coefficient Std. Error t-Statistic Prob.

C 94.0215 1.5458 60.8219 0.0000


T 0.0338 0.0111 3.0568 0.0025
HOLIDAY 13.6168 6.4511 2.1108 0.0359
FRIDAY 6.8491 2.1137 3.2404 0.0014
SATURDAY 10.3421 2.1153 4.8891 0.0000

R2 ¼ 0.1640 SSE ¼ 27424.19

7.3 Henry Saffer and Frank Chaloupka (‘‘The Demand for Illicit Drugs,’’ Economic
Inquiry, 37(3), 1999, 401–411) estimate demand equations for alcohol, marijuana,
cocaine, and heroin using a sample of size N ¼ 44,889. The estimated equation for
alcohol use after omitting a few control variables is shown in the chart at the top of
page 289.
The variable definitions (sample means in parentheses) are as follows:
The dependent variable is the number of days alcohol was used in the past 31 days
(3.49)
ALCOHOL PRICEprice of a liter of pure alcohol in 1983 dollars (24.78)
INCOMEtotal personal income in 1983 dollars (12,425)
GENDERa binary variable ¼ 1 if male (0.479)
MARITAL STATUSa binary variable ¼ 1 if married (0.569)
AGE 12–20a binary variable ¼ 1 if individual is 12–20 years of age (0.155)
AGE 21–30a binary variable ¼ 1 if individual is 21–30 years of age (0.197)
BLACKa binary variable ¼ 1 if individual is black (0.116)
HISPANICa binary variable ¼ 1 if individual is Hispanic (0.078)
7.6 EXERCISES 289

Demand for Illicit Drugs

Variable Coefficient t-statistic

C 4.099 17.98
ALCOHOL PRICE 0.045 5.93
INCOME 0.000057 17.45
GENDER 1.637 29.23
MARITAL STATUS 0.807 12.13
AGE 12–20 1.531 17.97
AGE 21–30 0.035 0.51
BLACK 0.580 8.84
HISPANIC 0.564 6.03

(a) Interpret the coefficient of alcohol price.


(b) Compute the price elasticity at the means of the variables.
(c) Compute the price elasticity at the means of alcohol price and income, for a
married black male, age 21–30.
(d) Interpret the coefficient of income. If we measured income in $1,000 units, what
would the estimated coefficient be?
(e) Interpret the coefficients of the indicator variables, as well as their significance.
7.4 In the file stockton.dat we have data from January 1991 to December 1996 on house
prices, square footage, and other characteristics of 4682 houses that were sold in
Stockton, California. One of the key problems regarding housing prices in a region
concerns construction of ‘‘house price indexes,’’ as discussed in Section 7.2.4b. To
illustrate, we estimate a regression model for house price, including as explanatory
variables the size of the house (SQFT), the age of the house (AGE), and annual
indicator variables, omitting the indicator variable for the year 1991.

PRICE ¼ b1 þ b2 SQFT þ b3 AGE þ d1 D92 þ d2 D93 þ d3 D94 þ d4 D95


þ d5 D96 þ e

The results are as follows:

Stockton House Price Index Model

Variable Coefficient Std. Error t-Statistic Prob.

C 21456.2000 1839.0400 11.6671 0.0000


SQFT 72.7878 1.0001 72.7773 0.0000
AGE 179.4623 17.0112 10.5496 0.0000
D92 4392.8460 1270.9300 3.4564 0.0006
D93 10435.4700 1231.8000 8.4717 0.0000
D94 13173.5100 1211.4770 10.8739 0.0000
D95 19040.8300 1232.8080 15.4451 0.0000
D96 23663.5100 1194.9280 19.8033 0.0000
292 USING INDICATOR VARIABLES

price (RELPRICE) and an indicator variable for the repair period (REPAIR).
That is, let

MOTEL PCTt ¼ b1 þ b2 COMP PCTt þ b3 RELPRICEt þ b4 REPAIRt þ et

Obtain the least squares estimates of the parameters. Interpret the estimated
coefficients, as well as their signs and significance.
(e) Using the least squares estimate of the coefficient of REPAIR from part (d),
compute an estimate of the revenue lost by the damaged motel during the repair
period (215 days @ $56.61  b4). Compare this value to the ‘‘simple’’ estimate in
part (b). Construct a 95% interval estimate for the estimated loss. Is the estimated
loss from part (b) within the interval estimate?
(f) Carry out the regression specification test RESET. Is there any evidence of model
misspecification?
(g) Plot the least squares residuals against TIME. Are there any obvious patterns?
7.9* In the STAR experiment (Section 7.5.3), children were randomly assigned within
schools into three types of classes: small classes with 13 to 17 students, regular-sized
classes with 22–25 students, and regular-sized classes with a full-time teacher aide to
assist the teacher. Student scores on achievement tests were recorded, as was some
information about the students, teachers, and schools. Data for the kindergarten
classes is contained in the data file star.dat.
(a) Calculate the average of TOTALSCORE for (i) students in regular-sized class-
rooms with full time teachers, but no aide; (ii) students in regular-sized classrooms
with full time teachers, and an aide; and (iii) students in small classrooms. What do
you observe about test scores in these three types of learning environments?
(b) Estimate the regression model TOTALSCOREi ¼ b1 þ b2 SMALLi þ b3 AIDEi þ
ei , where AIDE is a indicator variable equaling one for classes taught by a
teacher and an aide and zero otherwise. What is the relation of the estimated
coefficients from this regression to the sample means in part (a)? Test the
statistical significance of b3 at the 5% level of significance.
(c) To the regression in (b) add the additional explanatory variable TCHEXPER. Is
this variable statistically significant? Does its addition to the model affect the
estimates of b2 and b3?
(d) To the regression in (c) add the additional explanatory variables BOY,
FREELUNCH, and WHITE_ASIAN. Are any of these variables statistically
significant? Does their addition to the model affect the estimates of b2 and b3?
(e) To the regression in (d) add the additional explanatory variables TCHWHITE,
TCHMASTERS, SCHURBAN, and SCHRURAL. Are any of these variables
statistically significant? Does their addition to the model affect the estimates
of b2 and b3?
(f) Discuss the importance of parts (c), (d), and (e) to our estimation of the
‘‘treatment’’ effects in part (b).
(g) Add to the models in (b) through (e) indicator variables for each school

1 if student is in school j
SCHOOL j ¼
0 otherwise

Test the joint significance of these school ‘‘fixed effects.’’ Does the inclusion
of these fixed effect indicator variables substantially alter the estimates of b2
and b3?

You might also like