Themultipleregressionmodel: I I1 I2 I3
Themultipleregressionmodel: I I1 I2 I3
1 1 0 1
2 1 1 2
3 1 2 1
1 1 2 0
0 1 1 1
1 1 2 1
2 1 0 1
1 1 1 1
2 1 1 0
The data in the file london.dat were used to estimate this model. See Exercise 4.10 for
more details about the data. Note that only households with one or two children are
being considered. Thus, NK takes only the values one or two. Output from estimating
this equation appears in Table 5.6.
(a) Fill in the following blank spaces that appear in this table.
(i) The t-statistic for b1
(ii) The standard error for b2
(iii) The estimate b3
(iv) R2
(v) s^
(b) Interpret each of the estimates b2 , b3 , and b4 .
(c) Compute a 95% interval estimate for b3. What does this interval tell you?
(d) Test the hypothesis that the budget proportion for alcohol does not depend on the
number of children in the household. Can you suggest a reason for the test outcome?
5.4* The data set used in Exercise 5.3 is used again. This time it is used to estimate how the
proportion of the household budget spent on transportation WTRANS depends on
the log of total expenditure ln(TOTEXP), AGE, and number of children NK. The
output is reported in Table 5.7.
(a) Write out the estimated equation in the standard reporting format with standard
errors below the coefficient estimates.
(b) Interpret the estimates b2 , b3 , and b4 . Do you think the results make sense from an
economic or logical point of view?
(c) Are there any variables that you might exclude from the equation? Why?
(d) What proportion of variation in the budget proportion allocated to transport is
explained by this equation?
(e) Predict the proportion of a budget that will be spent on transportation, for both
one- and two-children households, when total expenditure and age are set at their
sample means, which are 98.7 and 36, respectively.
5.5 This question is concerned with the value of houses in towns surrounding Boston. It
uses the data of Harrison, D., and D. L. Rubinfeld (1978), ‘‘Hedonic Prices and the
Demand for Clean Air,’’ Journal of Environmental Economics and Management, 5,
81–102. The output appears in Table 5.8. The variables are defined as follows:
VALUE ¼ median value of owner-occupied homes in thousands of dollars
CRIME ¼ per capita crime rate
NITOX ¼ nitric oxide concentration (parts per million)
ROOMS ¼ average number of rooms per dwelling
AGE ¼ proportion of owner-occupied units built prior to 1940
A total N ¼ 27 observations were obtained using different test fields. The estimated
quadratic model, with an interaction term, is
b ¼ 1:385 þ 8:011NITRO þ 4:800PHOS 1:944NITRO2
YIELD
0:778PHOS 2 0:567NITRO PHOS
(a) Find equations describing the marginal effect of nitrogen on yield and the
marginal effect of phosporus on yield. What do these equations tell you?
(b) What are the marginal effects of nitrogen and of phosphorus when (i) NITRO and
PHOS ¼ 1 and (ii) when NITRO ¼ 2 and PHOS ¼ 2? Comment on your
findings.
(c) Test the hypothesis that the marginal effect of nitrogen is zero, when
(iv) PHOS ¼ 1 and NITRO ¼ 1
(v) PHOS ¼ 1 and NITRO ¼ 2
(vi) PHOS ¼ 1 and NITRO ¼ 3
Note: The following information may be useful:
b
varðb2 þ 2b4 þ b6 Þ ¼ 0:233
b
varðb þ 4b þ b Þ ¼ 0:040
2 4 6
b
varðb2 þ 6b4 þ b6 Þ ¼ 0:233
(d) ^[This part requires the use of calculus] For the function estimated, what levels
of nitrogen and phosphorus give maximum yield? Are these levels the optimal
fertilizer applications for the peanut producer?
5.9 When estimating wage equations, we expect that young, inexperienced workers will
have relatively low wages and that with additional experience their wages will rise,
but then begin to decline after middle age, as the worker nears retirement. This life-
cycle pattern of wages can be captured by introducing experience and experience
squared to explain the level of wages. If we also include years of education, we have
the equation
5.10 Use a computer to verify your answers to Exercise 5.1, parts (c), (e), (f), (g), and (h).
5.9 EXERCISES 205
(e) Test the hypothesis that the quality of cocaine has no influence on price against
the alternative that a premium is paid for better-quality cocaine.
(f) What is the average annual change in the cocaine price? Can you suggest why
price might be changing in this direction?
5.13 The file br2.dat contains data on 1,080 houses sold in Baton Rouge, Louisiana,
during mid-2005. We will be concerned with the selling price (PRICE), the size of the
house in square feet (SQFT), and the age of the house in years (AGE).
(a) Use all observations to estimate the following regression model and report the
results
@PRICE @PRICE
H0 : 1000 against H1 : < 1000
@AGE @AGE
(c) Add the interaction variable SQFT AGE to the model in part (b) and re-
estimate the equation. Report the results. Repeat parts (i), (ii), (iii), and (iv) from
part (b) for this new model. Use SQFT ¼ 2300 and AGE ¼ 20.
(d) From your answers to parts (a), (b), and (c), comment on the sensitivity of the
results to the model specification.
5.14 The file br2.dat contains data on 1,080 houses sold in Baton Rouge, Louisiana,
during mid-2005. We will be concerned with the selling price (PRICE), the size of the
house in square feet (SQFT), and the age of the house in years (AGE). Define a new
variable that measures house size in terms of hundreds of square feet,
SQFT100 ¼ SQFT=100.
(a) Estimate the following equation and report the results:
@PRICE @PRICE
H0 : 1000 against H1 : < 1000
@AGE @AGE
5.15* Reconsider the presidential voting data (fair4.dat) introduced in Exercise 2.14.
(a) Estimate the regression model
Report the results in standard format. Are the estimates for b2 and b3 significantly
different from zero at a 10% significance level? Did you use one-tail tests or two-
tail tests? Why?
(b) Assume the inflation rate is 4%. Predict the percentage vote for the incumbent
party when the growth rate is (i) 3%, (ii) 0%, and (iii) 3%.
(c) Test, as an alternative hypothesis, that the incumbent party will get the majority
of the expected vote when the growth rate is (i) 3%, (ii) 0%, and (iii) 3%. Use a
1% level of significance. If you were the president seeking re-election, why
might you set up each of these hypotheses as an alternative rather than a null
hypothesis?
5.16 Data on the weekly sales of a major brand of canned tuna by a supermarket chain in a
large midwestern U.S. city during a mid-1990’s calendar year are contained in the file
tuna.dat. There are 52 observations on the variables. SAL1 ¼ unit sales of brand no.
1 canned tuna; APR1 ¼ price per can of brand no. 1 canned tuna; APR2, APR3 ¼
price per can of brands no. 2 and 3 of canned tuna.
(a) The prices APR1, APR2, and APR3 are expressed in dollars. Multiply the
observations on each of these variables by 100 to express them in terms of
cents; call the new variables PR1, PR2, and PR3. Estimate the following
regression model and report the results:
SAL1 ¼ b1 þ b2 PR1 þ b3 PR2 þ b4 PR3 þ e
(b) Interpret the estimates b2, b3, and b4. Do they have the expected signs?
(c) Using suitable one-tail tests and a 5% significance level, test whether each of the
coefficients b2, b3, and b4 are significantly different from zero.
(d) Using a 5% significance level, test the following hypotheses:
(i) A 1-cent increase in the price of brand one reduces its sales by 300 cans.
(ii) A 1-cent increase in the price of brand two increases the sales of brand one
by 300 cans.
(iii) A 1-cent increase in the price of brand three increases the sales of brand
one by 300 cans.
5.9 EXERCISES 207
(iv) The effect of a price increase in brand two on sales of brand one is the
same as the effect of a price increase in brand three on sales of brand
one. Does the outcome of this test contradict your findings from parts (ii)
and (iii)?
(v) If prices of all 3 brands go up by 1 cent, there is no change in sales.
5.17 (a) Reconsider the model SAL1 ¼ b1 þ b2 PR1 þ b3 PR2 þ b4 PR3 þ e from Exer-
cise 5.16. Estimate this model if you have not already done so, and find a 95%
interval estimate for expected sales when PR1 ¼ 90; PR2 ¼ 75, and PR3 ¼ 75.
What is wrong with this interval?
(b) Estimate the alternative model lnðSAL1Þ ¼ a1 þ a2 PR1 þ a3 PR2 þ a4 PR3 þ e,
and find a 95% interval estimate for expected log of sales when
PR1 ¼ 90; PR2 ¼ 75, and PR3 ¼ 75. Convert this interval into one for sales,
and compare it with what you got in part (a).
(c) How does the interpretation of the coefficients in the model with ln(SAL1) as the
dependent variable differ from that for the coefficients in the model with SAL1 as
the dependent variable?
5.18 What is the relationship between crime and punishment? This important question has
been examined by Cornwell and Trumbull4 using a panel of data from North
Carolina. The cross sections are 90 counties, and the data are annual for the years
1981–1987. The data are in the file crime.dat.
Using the data from 1987, estimate a regression relating the log of the crime rate
LCRMRTE to the probability of an arrest PRBARR (the ratio of arrests to offenses),
the probability of conviction PRBCONV (the ratio of convictions to arrests), the
probability of a prison sentence PRBPRIS (the ratio of prison sentences to convic-
tions), the number of police per capita POLPC, and the weekly wage in construction
WCON. Write a report of your findings. In your report, explain what effect you would
expect each of the variables to have on the crime rate and note whether the estimated
coefficients have the expected signs and are significantly different from zero. What
variables appear to be the most important for crime deterrence? Can you explain the
sign for the coefficient of POLPC?
5.19 Use the data in cps4_small.dat to estimate the following wage equation
(a) Report the results. Interpret the estimates for b2, b3, and b4. Are these estimates
significantly different from zero?
(b) Test the hypothesis that an extra year of education increases the wage rate by at
least 10% against the alternative that it is less than 10%.
(c) Find a 90% interval estimate for the percentage increase in wage from working
an additional hour per week.
(d) Re-estimate the model with the additional variables EDUC EXPER, EDUC 2,
and EXPER2. Report the results. Are the estimated coefficients significantly
different from zero?
(e) For the new model, find expressions for the marginal effects @ lnðWAGEÞ=
@EDUC and @ lnðWAGEÞ=@EXPER:
4
‘‘Estimating the Economic Model of Crime with Panel Data,’’ Review of Economics and Statistics, 76, 1994,
360–366. The data was kindly provided by the authors.
210 THE MULTIPLE REGRESSION MODEL
(c) Find a 95% interval estimate for the elasticity of production with respect to
fertilizer. Has this elasticity been precisely measured?
(d) Using a 5% level of significance, test the hypothesis that the elasticity of
production with respect to labor is less than or equal to 0.3 against the alter-
native that it is greater than 0.3. What happens if you reverse the null and
alternative hypotheses?
5.25 Consider the following aggregate production function for the U.S. manufacturing
sector:
N
Sðb1 ; b2 ; b3 Þ ¼ å ðyi b1 b2 xi2 b3 xi3 Þ2
i¼1
The first step is to partially differentiate S with respect to b1 , b2 , and b3 and to set the first-
order partial derivatives to zero. This yields
qS
¼ 2Nb1 þ 2b2 åxi2 þ 2b3 åxi3 2åyi
qb1
qS
¼ 2b1 åxi2 þ 2b2 åx2i2 þ 2b3 åxi2 xi3 2åxi2 yi
qb2
qS
¼ 2b1 åxi3 þ 2b2 åxi2 xi3 þ 2b3 åx2i3 2åxi3 yi
qb3
Setting these partial derivatives equal to zero, dividing by 2, and rearranging yields
With 95% confidence we estimate that average sales over many weeks will lie between
$75,144 and $78,804, but in any single week we forecast sales will be between $67,533 and
$86,415.
6.6 Exercises
Answers to exercises marked * appear at www.wiley.com/college/hill.
6.6.1 PROBLEMS
where the explanatory variables are years of education, years of experience and hours
worked per week. Estimation results for this equation, and for modified versions of it
obtained by dropping some of the variables, are displayed in Table 6.4. These results
are from the 1000 observations in the file cps4c_small.dat.
(a) Using an approximate 5% critical value of tc ¼ 2, what coefficient estimates are
not significantly different from zero?
(b) What restriction on the coefficients of Eqn (A) gives Eqn (B)? Use an F-test to
test this restriction. Show how the same result can be obtained using a t-test.
6.6 EXERCISES 247
(c) What restrictions on the coefficients of Eqn (A) give Eqn (C)? Use an F-test to
test these restrictions. What question would you be trying to answer by
performing this test?
(d) What restrictions on the coefficients of Eqn (B) give Eqn (D)? Use an F-test to
test these restrictions. What question would you be trying to answer by
performing this test?
(e) What restrictions on the coefficients of Eqn (A) give Eqn (E)? Use an F-test to
test these restrictions. What question would you be trying to answer by
performing this test?
(f) Based on your answers to parts (a) to (e), which model would you prefer? Why?
(g) Compute the missing AIC value for Eqn (D) and the missing SC value for Eqn
(A). Which model is favored by the AIC? Which model is favored by the SC?
6.5* Consider the wage equation
(a) Suppose you wish to test the hypothesis that a year of education has the same
effect on ln (WAGE) as a year of experience. What null and alternative hypoth-
eses would you set up?
(b) What is the restricted model, assuming that the null hypothesis is true?
(c) Given that the sum of squared errors from the restricted model is SSER ¼
254.1726, test the hypothesis in (a). (For SSEU use the relevant value from
Table 6.4. The sample size is N ¼ 1,000.)
Eqn (A) Eqn (B) Eqn (C) Eqn (D) Eqn (E)
6.6 RESET suggests augmenting an existing model with the squares of the predictions
^y2 , or with their squares and cubes (^y2 ; ^y3 ). What would happen if you augmented
the model with the predictions themselves ^y?
6.7 Table 6.5 contains output for the two models
y ¼ b1 þ b2 x þ b3 w þ e
y ¼ b 1 þ b2 x þ e
obtained using N ¼ 35 observations. RESET applied to the second model yields
F-values of 17.98 (for ^y2 ) and 8.72 (for ^y2 and ^y3 ). The correlation between x and w is
rxw ¼ 0:975. Discuss the following questions:
(a) Should w be included in the model?
(b) What can you say about omitted-variable bias?
(c) What can you say about the existence of collinearity and its possible effect?
6.8 In Section 6.1.5 we tested the joint null hypothesis
in the model
By substituting the restrictions into the model and rearranging variables, show how
the model can be written in a form in which least squares estimation will yield
restricted least squares estimates.
in terms of logarithms and estimated it using data in the file manuf.dat. Use the data
and results from Exercise 5.25 to test the following hypotheses:
(a) H0 : b2 ¼ 0 against H1 : b2 6¼ 0:
(b) H0 : b2 ¼ 0, b3 ¼ 0 against H1 : b2 6¼ 0 and/or b3 6¼ 0.
(c) H0 : b2 ¼ 0, b4 ¼ 0 against H1 : b2 6¼ 0 and/or b4 6¼ 0:
(d) H0 : b2 ¼ 0, b3 ¼ 0; b4 ¼ 0 against H1 : b2 6¼ 0 and/or b3 6¼ 0 and/or b4 6¼ 0.
(e) H0 : b2 þ b3 þ b4 þ b5 ¼ 1 against H1 : b2 þ b3 þ b4 þ b5 6¼ 1.
(f) Analyze the impact of collinearity on this model.
6.10* Use the sample data for beer consumption in the file beer.dat to
(a) Estimate the coefficients of the demand relation (6.14) using only sample
information. Compare and contrast these results to the restricted coefficient
results given in (6.19).
(b) Does collinearity appear to be a problem?
(c) Test the validity of the restriction that implies that demand will not change if
prices and income go up in the same proportion.
(d) Use model (6.19) to construct a 95% prediction interval for Q when
PB ¼ 3:00, PL ¼ 10, PR ¼ 2:00, and I ¼ 50000. (Hint: Construct the interval
for ln(Q) and then take antilogs.)
(e) Repeat part (d) using the unconstrained model from part (a). Comment.
6.11 Consider production functions of the form Q ¼ f (L, K), where Q is the output measure
and L and K are labor and capital inputs, respectively. A popular functional form is the
Cobb–Douglas equation
lnðQÞ ¼ b1 þ b2 lnðLÞ þ b3 lnðKÞ þ e
(a) Use the data in the file cobb.dat to estimate the Cobb–Douglas production
function. Is there evidence of collinearity?
(b) Re-estimate the model with the restriction of constant returns to scale—that is,
b2 þ b3 ¼ 1—and comment on the results.
6.12* Using data in the file beer.dat, apply RESET to the two alternative models
lnðQÞ ¼ b1 þ b2 lnðPBÞ þ b3 lnðPLÞ þ b4 lnðPRÞ þ b5 lnðIÞ þ e
Q ¼ b1 þ b2 PB þ b3 PL þ b4 PR þ b5 I þ e
Which model seems to better reflect the demand for beer?
6.13 The file toodyay.dat contains 48 annual observations on a number of variables related
to wheat yield in the Toodyay Shire of Western Australia, for the period 1950– 1997.
Those variables are
Y ¼ wheat yield in tonnes per hectare,
t ¼ trend term to allow for technological change,
RG ¼ rainfall at germination (May–June),
RD ¼ rainfall at development stage (July–August), and
RF ¼ rainfall at flowering (September –October).
The unit of measurement for rainfall is centimeters. A model that allows for the yield
response to rainfall to be different for the three different periods is
Y ¼ b1 þ b2 t þ b3 RG þ b4 RD þ b5 RF þ e
(a) Estimate this model. Report the results and comment on the signs and signifi-
cance of the estimated coefficients.
(b) Test the hypothesis that the response of yield to rainfall is the same irrespective of
whether the rain falls during germination, development, or flowering.
(c) Estimate the model under the restriction that the three responses to rainfall are the
same. Comment on the results.
6.14 Following on from the example in Section 6.3, the file hwage.dat contains another
subset of the data used by labor economist Tom Mroz. The variables with which we
are concerned are
252 FURTHER INFERENCE IN THE MULTIPLE REGRESSION MODEL
6.20* Reconsider the production function for rice estimated in Exercise 5.24 using data in
the file rice.dat:
6.22* In Chapter 5.7 we used the data in file pizza4.dat to estimate the model
(a) Test the hypothesis that age does not affect pizza expenditure—that is, test the
joint hypothesis H0:b2 ¼ 0, b4 ¼ 0. What do you conclude?
(b) Construct point estimates and 95% interval estimates of the marginal propensity
to spend on pizza for individuals of ages 20, 30, 40, 50, and 55. Comment on
these estimates.
(c) Modify the equation to permit a ‘‘life-cycle’’ effect in which the marginal effect
of income on pizza expenditure increases with age, up to a point, and then falls.
Do so by adding the term (AGE2 INC) to the model. What sign do you
anticipate on this term? Estimate the model and test the significance of the
coefficient for this variable. Did the estimate have the expected sign?
(d) Using the model in (c), construct point estimates and 95% interval estimates of
the marginal propensity to spend on pizza for individuals of ages 20, 30, 40, 50
and 55. Comment on these estimates. In light of these values, and of the range of
age in the sample data, what can you say about the quadratic function of age that
describes the marginal propensity to spend on pizza?
(e) Forthe modelinpart(c),are eachofthe coefficient estimates for AGE, (AGE INC)
and(AGE2 INC)significantlydifferentfromzeroata 5%significancelevel?Carry
out a joint test for the significance of these variables. Comment on your results.
(f) Check the model used in part (c) for collinearity. Add the term (AGE3 INC) to
the model in (c) and check the resulting model for collinearity.
6.23 Use the data in cps4_small.dat to estimate the following wage equation:
(a) Interpret these regression results. When should emergency rooms expect more
calls?
(b) The model was reestimated omitting the variables FULLMOON and NEW-
MOON, as shown below. Comment on any changes you observe.
(c) Test the joint significance of FULLMOON and NEWMOON. State the null and
alternative hypotheses and indicate the test statistic you use. What do you
conclude?
Emergency Room Cases RegressionModel 2
7.3 Henry Saffer and Frank Chaloupka (‘‘The Demand for Illicit Drugs,’’ Economic
Inquiry, 37(3), 1999, 401–411) estimate demand equations for alcohol, marijuana,
cocaine, and heroin using a sample of size N ¼ 44,889. The estimated equation for
alcohol use after omitting a few control variables is shown in the chart at the top of
page 289.
The variable definitions (sample means in parentheses) are as follows:
The dependent variable is the number of days alcohol was used in the past 31 days
(3.49)
ALCOHOL PRICEprice of a liter of pure alcohol in 1983 dollars (24.78)
INCOMEtotal personal income in 1983 dollars (12,425)
GENDERa binary variable ¼ 1 if male (0.479)
MARITAL STATUSa binary variable ¼ 1 if married (0.569)
AGE 12–20a binary variable ¼ 1 if individual is 12–20 years of age (0.155)
AGE 21–30a binary variable ¼ 1 if individual is 21–30 years of age (0.197)
BLACKa binary variable ¼ 1 if individual is black (0.116)
HISPANICa binary variable ¼ 1 if individual is Hispanic (0.078)
7.6 EXERCISES 289
C 4.099 17.98
ALCOHOL PRICE 0.045 5.93
INCOME 0.000057 17.45
GENDER 1.637 29.23
MARITAL STATUS 0.807 12.13
AGE 12–20 1.531 17.97
AGE 21–30 0.035 0.51
BLACK 0.580 8.84
HISPANIC 0.564 6.03
price (RELPRICE) and an indicator variable for the repair period (REPAIR).
That is, let
Obtain the least squares estimates of the parameters. Interpret the estimated
coefficients, as well as their signs and significance.
(e) Using the least squares estimate of the coefficient of REPAIR from part (d),
compute an estimate of the revenue lost by the damaged motel during the repair
period (215 days @ $56.61 b4). Compare this value to the ‘‘simple’’ estimate in
part (b). Construct a 95% interval estimate for the estimated loss. Is the estimated
loss from part (b) within the interval estimate?
(f) Carry out the regression specification test RESET. Is there any evidence of model
misspecification?
(g) Plot the least squares residuals against TIME. Are there any obvious patterns?
7.9* In the STAR experiment (Section 7.5.3), children were randomly assigned within
schools into three types of classes: small classes with 13 to 17 students, regular-sized
classes with 22–25 students, and regular-sized classes with a full-time teacher aide to
assist the teacher. Student scores on achievement tests were recorded, as was some
information about the students, teachers, and schools. Data for the kindergarten
classes is contained in the data file star.dat.
(a) Calculate the average of TOTALSCORE for (i) students in regular-sized class-
rooms with full time teachers, but no aide; (ii) students in regular-sized classrooms
with full time teachers, and an aide; and (iii) students in small classrooms. What do
you observe about test scores in these three types of learning environments?
(b) Estimate the regression model TOTALSCOREi ¼ b1 þ b2 SMALLi þ b3 AIDEi þ
ei , where AIDE is a indicator variable equaling one for classes taught by a
teacher and an aide and zero otherwise. What is the relation of the estimated
coefficients from this regression to the sample means in part (a)? Test the
statistical significance of b3 at the 5% level of significance.
(c) To the regression in (b) add the additional explanatory variable TCHEXPER. Is
this variable statistically significant? Does its addition to the model affect the
estimates of b2 and b3?
(d) To the regression in (c) add the additional explanatory variables BOY,
FREELUNCH, and WHITE_ASIAN. Are any of these variables statistically
significant? Does their addition to the model affect the estimates of b2 and b3?
(e) To the regression in (d) add the additional explanatory variables TCHWHITE,
TCHMASTERS, SCHURBAN, and SCHRURAL. Are any of these variables
statistically significant? Does their addition to the model affect the estimates
of b2 and b3?
(f) Discuss the importance of parts (c), (d), and (e) to our estimation of the
‘‘treatment’’ effects in part (b).
(g) Add to the models in (b) through (e) indicator variables for each school
1 if student is in school j
SCHOOL j ¼
0 otherwise
Test the joint significance of these school ‘‘fixed effects.’’ Does the inclusion
of these fixed effect indicator variables substantially alter the estimates of b2
and b3?