SM Sbe13e Chapter 14
SM Sbe13e Chapter 14
Learning Objectives
1. Understand how regression analysis can be used to develop an equation that estimates
mathematically how two variables are related.
2. Understand the differences between the regression model, the regression equation, and the estimated
regression equation.
3. Know how to fit an estimated regression equation to a set of sample data based upon the least-
squares method.
4. Be able to determine how good a fit is provided by the estimated regression equation and compute
the sample correlation coefficient from the regression analysis output.
5. Understand the assumptions necessary for statistical inference and be able to test for a significant
relationship.
6. Know how to develop confidence interval estimates of y given a specific value of x in both the case
of a mean value of y and an individual value of y.
7. Learn how to use a residual plot to make a judgement as to the validity of the regression
assumptions.
14 - 1
© 2017 Cengage Learning. All Rights Reserved.
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Chapter 14
Solutions:
1 a.
16
14
12
10
8
y
6
4
2
0
0 1 2 3 4 5 6
x
c. Many different straight lines can be drawn to provide a linear approximation of the
relationship between x and y; in part (d) we will determine the equation of a straight line
that “best” represents the relationship according to the least squares criterion.
xi 15 yi 40
d. x 3 y 8
n 5 n 5
( xi x )( yi y ) 26 ( xi x ) 2 10
( xi x )( yi y ) 26
b1 2.6
( xi x ) 2 10
b0 y b1 x 8 (2.6)(3) 0.2
yˆ 0.2 2.6 x
14 - 2
© 2017 Cengage Learning. All Rights Reserved.
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Simple Linear Regression
2. a.
60
50
40
30
y
20
10
0
0 5 10 15 20 25
x
c. Many different straight lines can be drawn to provide a linear approximation of the
relationship between x and y; in part (d) we will determine the equation of a straight line
that “best” represents the relationship according to the least squares criterion.
( xi x )( yi y ) 540 ( xi x ) 2 180
( xi x )( yi y ) 540
b1 3
( xi x ) 2 180
b0 y b1 x 35 (3)(11) 68
yˆ 68 3 x
e. yˆ 68 3(10) 38
14 - 3
© 2017 Cengage Learning. All Rights Reserved.
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Chapter 14
3. a.
30
25
20
15
y
10
0
0 5 10 15 20 25
x
xi 50 yi 83
b. x 10 y 16.6
n 5 n 5
( xi x )( yi y ) 171 ( xi x ) 2 190
( xi x )( yi y ) 171
b1 0.9
( xi x ) 2 190
yˆ 7.6 0.9 x
c. yˆ 7.6 0.9(6) 13
14 - 4
© 2017 Cengage Learning. All Rights Reserved.
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Simple Linear Regression
4. a.
70
60
50
% Management
40
30
20
10
0
40 45 50 55 60 65 70 75
% Working
b. There appears to be a positive linear relationship between the percentage of women working in the
five companies (x) and the percentage of management jobs held by women in that company (y)
c. Many different straight lines can be drawn to provide a linear approximation of the
relationship between x and y; in part (d) we will determine the equation of a straight line
that “best” represents the relationship according to the least squares criterion.
( x i x )( y i y ) 624 ( x i x ) 2 480
( xi x )( yi y ) 624
b1 1.3
( xi x )2 480
b0 y b1 x 43 1.3(60) 35
yˆ 35 1.3 x
14 - 5
© 2017 Cengage Learning. All Rights Reserved.
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Chapter 14
5. a.
25
15
10
0
0 10 20 30 40 50 60
Line Speed (feet per minute)
b. There appears to be a negative relationship between line speed (feet per minute) and the number of
defective parts.
c. Let x = line speed (feet per minute) and y = number of defective parts.
( xi x )( yi y ) 300 ( xi x ) 2 1000
( xi x )( yi y ) 300
b1 .3
( xi x )2 1000
b0 y b1 x 17 ( .3)(35) 27.5
yˆ 27.5 .3x
14 - 6
© 2017 Cengage Learning. All Rights Reserved.
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Simple Linear Regression
6. a.
90
80
70
60
50
Win%
40
30
20
10
0
4 5 6 7 8 9
Yds/Att
b. The scatter diagram indicates a positive linear relationship between x = average number of passing
yards per attempt and y = the percentage of games won by the team.
( xi x )( yi y ) 121.6 ( xi x ) 2 7.08
( xi x )( yi y ) 121.6
b1 17.1751
( xi x ) 2 7.08
yˆ 70.391 17.1751x
d. The slope of the estimated regression line is approximately 17.2. So, for every increase of one yard
in the average number of passes per attempt, the percentage of games won by the team increases by
17.2%.
e. With an average number of passing yards per attempt of 6.2, the predicted percentage of games won
is ŷ = -70.391 + 17.175(6.2) = 36%. With a record of 7 wins and 9 loses, the percentage of wins that
the Kansas City Chiefs won is 43.8 or approximately 44%. Considering the small data size, the
prediction made using the estimated regression equation is not too bad.
14 - 7
© 2017 Cengage Learning. All Rights Reserved.
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Chapter 14
7. a.
150
140
Annual Sales ($1000s) 130
120
110
100
90
80
70
60
50
0 2 4 6 8 10 12 14
Years of Experience
( xi x )( yi y ) 568 ( xi x ) 2 142
( xi x )( yi y ) 568
b1 4
( xi x ) 2 142
b0 y b1 x 108 (4)(7) 80
y 80 4 x
14 - 8
© 2017 Cengage Learning. All Rights Reserved.
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Simple Linear Regression
8. a.
4.5
4.0
Satisfaction 3.5
3.0
2.5
2.0
2.0 2.5 3.0 3.5 4.0 4.5
Speed of Execution
b. The scatter diagram indicates a positive linear relationship between x = speed of execution rating and
y = overall satisfaction rating for electronic trades.
( x i x )( y i y ) 2.4 ( x i x ) 2 2.6
( xi x )( yi y ) 2.4
b1 .9077
( xi x ) 2 2.6
yˆ .2046 .9077 x
d. The slope of the estimated regression line is approximately .9077. So, a one unit increase in the
speed of execution rating will increase the overall satisfaction rating by approximately .9 points.
e. The average speed of execution rating for the other brokerage firms is 3.4. Using this as the new
value of x for [Link], we can use the estimated regression equation developed in part (c) to
estimate the overall satisfaction rating corresponding to x = 3.4.
Thus, an estimate of the overall satisfaction rating when x = 3.4 is approximately 3.3.
14 - 9
© 2017 Cengage Learning. All Rights Reserved.
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Chapter 14
9. a.
160
140
b. The scatter diagram indicates a positive linear relationship between x = cars in service (1000s) and y
= annual revenue ($millions).
( xi x )( yi y ) 734.6 ( xi x ) 2 56.655
( xi x )( yi y ) 734.6
b1 12.9662
( xi x )2 56.655
b0 y b1 x 77 (12.9662)(7.25) 17.005
yˆ 17.005 12.966 x
d. For every additional 1000 cars placed in service annual revenue will increase by 12.966 ($millions)
or $12,966,000. Therefor every additional car placed in service will increase annual revenue by
$12,966.
A prediction of annual revenue for Fox Rent A Car is approximately $126 million.
14 - 10
© 2017 Cengage Learning. All Rights Reserved.
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Simple Linear Regression
10. a.
1400
1200
% Gain in Options Value
1000
800
600
400
200
0
0 100 200 300 400 500 600
% Increase in Stock Price
b. The scatter diagram indicates a positive linear relationship between x = percentage increase in the
stock price and y = percentage gain in options value. In other words, options values increase as stock
prices increase.
( xi x )( yi y ) 314,501.1
b1 2.7149
( xi x ) 2 115,842.9
yˆ 167.81 2.7149 x
d. The slope of the estimated regression line is approximately 2.7. So, for every percentage increase in
the price of the stock the options value increases by 2.7%.
e. The rewards for the CEO do appear to be based upon performance increases in the stock value.
While the rewards may seem excessive, the executive is being rewarded for his/her role in increasing
the value of the company. This is why such compensation schemes are devised for CEOs by boards
of directors. A compensation scheme where an executive got a big salary increase when the
company stock went down would be bad. And, if the stock price for a company had gone down
during the periods in question, the value of the CEOs options would also go down.
14 - 11
© 2017 Cengage Learning. All Rights Reserved.
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Chapter 14
11. a.
85
80
Overall Score 75
70
65
60
55
50
400 600 800 1000 1200 1400
Price ($)
b. The scatter diagram indicates a positive linear relationship between x = price ($) and y = overall
score.
( xi x )( yi y ) 11,900
b1 .021212
( xi x )2 561,000
yˆ 53.864 .0212 x
d. The slope of .0212 means that spending an additional $100 in price will increase the overall score by
approximately 2 points.
14 - 12
© 2017 Cengage Learning. All Rights Reserved.
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Simple Linear Regression
12. a.
190
170
Entertainment ($)
150
130
110
90
70
70 90 110 130 150 170
Hotel Room Rate ($)
b. The scatter diagram indicates a positive linear relationship between x = hotel room rate and the
amount spent on entertainment.
( x i x )( y i y ) 4237 ( x i x ) 2 4100
( xi x )( yi y ) 4237
b1 1.0334
( xi x ) 2 4100
yˆ 17.49 1.0334 x
Note: In The Wall Street Journal article the entertainment expense for Chicago was $146. Thus, the
estimated regression equation provided a good estimate of entertainment expenses for Chicago.
14 - 13
© 2017 Cengage Learning. All Rights Reserved.
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Chapter 14
13. a.
30.0
15.0
10.0
5.0
0.0
0.0 20.0 40.0 60.0 80.0 100.0 120.0 140.0
Adjusted Gross Income ($1000s)
( xi x )( yi y ) 1233.7 ( xi x ) 2 7648
( xi x )( yi y ) 1233.7
b1 0.1613
( xi x ) 2 7648
y 4.68 016
. x
c. y 4.68 016
. x 4.68 016
. (52.5) 13.08 or approximately $13,080.
14 - 14
© 2017 Cengage Learning. All Rights Reserved.
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Simple Linear Regression
14. a.
9
8
The scatter diagram indicates a negative linear relationship between x = distance to work and y =
number of days absent.
b. x xi / n 90 / 10 9 y yi / n 50 / 10 5
( xi x )( y i y ) 95 ( xi x ) 2 276
( xi x )( yi y ) 95
b1 .3442
( xi x ) 2 276
b0 y b1 x 5 ( .3442)(9) 8.0978
yˆ 8.0978 .3442 x
c. A prediction of the number of days absent is yˆ 8.0978 .3442(5) 6.4 or approximately 6 days.
15. a. The estimated regression equation and the mean for the dependent variable are:
The sum of squares due to error and the total sum of squares are
The least squares line provided a very good fit; 84.5% of the variability in y has been explained by
the least squares line.
14 - 15
© 2017 Cengage Learning. All Rights Reserved.
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Chapter 14
16. a. The estimated regression equation and the mean for the dependent variable are:
yˆi 68 3 x y 35
The sum of squares due to error and the total sum of squares are
The least squares line provided an excellent fit; 87.6% of the variability in y has been explained by
the estimated regression equation.
Note: the sign for r is negative because the slope of the estimated regression equation is negative.
(b1 = -3)
17. The estimated regression equation and the mean for the dependent variable are:
The sum of squares due to error and the total sum of squares are
We see that 54.7% of the variability in y has been explained by the least squares line.
SSR 1512.376
b. r2 .84
SST 1800
c. r r 2 .84 .917
14 - 16
© 2017 Cengage Learning. All Rights Reserved.
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Simple Linear Regression
19. a. The estimated regression equation and the mean for the dependent variable are:
ŷ = 80 + 4x y = 108
The sum of squares due to error and the total sum of squares are
We see that 93% of the variability in y has been explained by the least squares line.
( xi x )( yi y ) 31, 284
b1 1439
( xi x ) 2 21.74
yˆ 28,574 1439 x
Thus, an estimate of the price for a bike that weighs 15 pounds is $6989.
( xi x )( yi y ) 712,500
b1 7.6
( xi x ) 2 93, 750
y 1246.67 7.6 x
14 - 17
© 2017 Cengage Learning. All Rights Reserved.
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Chapter 14
b. $7.60
c. The sum of squares due to error and the total sum of squares are:
We see that 95.87% of the variability in y has been explained by the estimated regression equation.
SSR 9524.97
r2 .9013
SST 10,568
b. The estimated regression equation provided a very good fit; approximately 90% of the variability in
the dependent variable was explained by the linear relationship between the two variables.
c. r r 2 ..9013 .95
c. ( xi x ) 2 10
s 2.033
sb1 0.643
( xi x ) 2
10
b1 2.6
d. t 4.044
sb1 .643
Using t table (3 degrees of freedom), area in tail is between .01 and .025
14 - 18
© 2017 Cengage Learning. All Rights Reserved.
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Simple Linear Regression
Using F table (1 degree of freedom numerator and 3 denominator), p-value is between .025 and .05
c. ( xi x ) 2 180
s 8.7560
sb1 0.6526
( xi x ) 2
180
b1 3
d. t 4.59
sb1 .653
Using t table (3 degrees of freedom), area in tail is less than .01; p-value is less than .02
Using F table (1 degree of freedom numerator and 3 denominator), p-value is less than .025
14 - 19
© 2017 Cengage Learning. All Rights Reserved.
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Chapter 14
b. ( xi x ) 2 190
s 6.5141
sb1 0.4726
( xi x ) 2
190
b1 .9
t 1.90
sb1 .4726
Using t table (3 degrees of freedom), area in tail is between .05 and .10
Because p-value > , we cannot reject H0: 1 = 0; x and y do not appear to be related.
Using F table (1 degree of freedom numerator and 3 denominator), p-value is greater than .10
Because p-value > , we cannot reject H0: 1 = 0; x and y do not appear to be related.
( x x ) 2
14,950
s 8.4797
sb1 .0694
(x x ) 2
14,950
b1 .318
t 4.58
sb1 .0694
Using t table (4 degrees of freedom), area in tail is between .005 and .01
14 - 20
© 2017 Cengage Learning. All Rights Reserved.
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Simple Linear Regression
Because p-value , we reject H0: 1 = 0; there is a significant relationship between price and
overall score
Using F table (1 degree of freedom numerator and 4 denominator), p-value is between .025 and .01
c.
Source Sum Degrees Mean
of Variation of Squares of Freedom Square F p-value
Regression 1512.376 1 1512.376 21.03 .010
Error 287.624 4 71.906
Total 1800 5
27. a.
75
70
Stress Toleracne
65
60
55
50
50 60 70 80 90 100 110
Average Annual Salary ($1000s)
The scatter diagram suggests a negative linear relationship between the two variables.
( xi x )( yi y ) 367.2 ( xi x ) 1742.4
2
14 - 21
© 2017 Cengage Learning. All Rights Reserved.
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Chapter 14
( xi x )( yi y ) 367.2
b1 .2107
( xi x )2 1742.4
b0 y b1 x 66 ( .2107)(86.6) 84.2466
yˆ 84.2466 .2107 x
Using F table (1 degree of freedom numerator and 8 denominator), p-value is less than .01
The estimated regression equation provided a reasonably good fit; we should feel comfortable using
the estimated regression equation to estimate the stress level tolerance given the average annual
salary as long as the value of the average annual salary is within the range of the current data.
e. The relationship between the average annual salary and stress tolerance is counterintuitive because
one would think that jobs that pay more are most likely going to require more time and will likely
involve a more stressful environment. One possibility is that the limited size of the data set is
masking a much different relationship that might be more evident with a larger sample of
occupations. And, the stress tolerance rating used in this study may not necessarily be a good
indicator of the actual stress.
28. The sum of squares due to error and the total sum of squares are
We can use either the t test or F test to determine whether speed of execution and overall satisfaction
are related.
14 - 22
© 2017 Cengage Learning. All Rights Reserved.
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Simple Linear Regression
( xi x ) 2 2.6
s .3997
sb1 .2479
( xi x ) 2
2.6
b1 .9077
t 3.66
sb
1
.2479
Using t table (9 degrees of freedom), area in tail is less than .005; p-value is less than .01
Because we can reject H0: 1 = 0 we conclude that speed of execution and overall satisfaction are
related.
Using F table (1 degree of freedom numerator and 9 denominator), p-value is less than .01
Because we can reject H0: 1 = 0 we conclude that speed of execution and overall satisfaction are
related.
14 - 23
© 2017 Cengage Learning. All Rights Reserved.
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Chapter 14
Using F table (1 degree of freedom numerator and 4 denominator), p-value is less than .01
Because p-value , we reject H0: 1 = 0. Production volume and total cost are related.
s 260.7575 16.1480
( xi x )2 = 56.655
s 16.148
sb1 2.145
(x i x) 2
56.655
b1 12.966
t 6.045
sb1 2.145
Using F table (1 degree of freedom numerator and 8 denominator), p-value is less than .01
14 - 24
© 2017 Cengage Learning. All Rights Reserved.
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Simple Linear Regression
32. a. s = 2.033
x 3 ( xi x )2 10
1 ( x* x )2 1 (4 3) 2
s yˆ * s 2.033 1.11
n ( xi x ) 2 5 10
yˆ * t /2 s yˆ*
or 7.07 to 14.13
1 ( x* x )2 1 (4 3)2
c. spred s 1 2.033 1 2.32
n ( xi x )2 5 10
d. ŷ * t /2 spred
or 3.22 to 17.98
33. a. s = 8.7560
b. x 11 ( xi x )2 180
1 ( x* x )2 1 (8 11)2
s yˆ * s 8.7560 4.3780
n ( xi x ) 2
5 180
yˆ * t /2 s yˆ*
or 30.07 to 57.93
1 ( x* x ) 2 1 (8 11) 2
c. spred s 1 8.7560 1 9.7895
n ( xi x ) 2
5 180
d. ŷ * t /2 spred
14 - 25
© 2017 Cengage Learning. All Rights Reserved.
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Chapter 14
44 3.182(9.7895) = 44 31.15
or 12.85 to 75.15
34. s = 6.5141
x 10 ( xi x )2 190
1 ( x* x )2 1 (12 10)2
s yˆ* s 6.5141 3.0627
n ( xi x )2 5 190
yˆ * t /2 s yˆ*
or 8.65 to 28.15
1 ( x* x ) 2 1 (12 10)2
spred s 1 6.5141 1 7.1982
n ( xi x )2 5 190
ŷ * t /2 spred
or -4.50 to 41.30
The two intervals are different because there is more variability associated with predicting an
individual value than there is a mean value.
x 3.2 ( xi x )2 0.74
1 ( x* x ) 2 1 (3 3.2)2
s yˆ* s 145.89 68.54
n ( xi x )2 6 0.74
yˆ * t /2 s yˆ*
or $3643.53 to $4024.07
14 - 26
© 2017 Cengage Learning. All Rights Reserved.
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Simple Linear Regression
1 ( x* x ) 2 1 (3 3.2)2
c. spred s 1 145.89 1 161.19
n ( xi x )2 6 0.74
ŷ * t /2 spred
or $3386.34 to $4281.26
d. As expected, the prediction interval is much wider than the confidence interval. This is due to the
fact that it is more difficult to predict the starting salary for one new student with a GPA of 3.0 than
it is to estimate the mean for all students with a GPA of 3.0.
1 ( x* x ) 2 1 (9 7)2
36. a. s yˆ* s 4.6098 1.6503
n ( xi x ) 2
10 142
yˆ * t /2 s yˆ*
yˆ * 80 4 x * 80 4(9) 116
1 ( x* x )2 1 (9 7)2
b. spred s 1 4.6098 1 4.8963
n ( xi x ) 2
10 142
ŷ * t /2 spred
c. As expected, the prediction interval is much wider than the confidence interval. This is due to the
fact that it is more difficult to predict annual sales for one new salesperson with 9 years of
experience than it is to estimate the mean annual sales for all salespersons with 9 years of
experience.
37. a. x 57 ( xi x )2 7648
s2 = 1.88 s = 1.37
1 ( x* x )2 1 (52.5 57)2
s yˆ * s 1.37 0.52
n ( xi x ) 2 7 7648
yˆ * t /2 s yˆ*
14 - 27
© 2017 Cengage Learning. All Rights Reserved.
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Chapter 14
b. spred = 1.47
d. Any deductions exceeding the $16,860 upper limit could suggest an audit.
b. x 575 ( xi x )2 93,750
1 ( x* x )2 1 (500 575)2
spred s 1 241.52 1 267.50
n ( xi x )2 6 93,750
ŷ * t /2 spred
or $3815.10 to $6278.24
c. Based on one month, $6000 is not out of line since $3815.10 to $6278.24 is the prediction interval.
However, a sequence of five to seven months with consistently high costs should cause concern.
s 220.2 14.391
1 ( x* x ) 2 1 (89 105)2
s yˆ * s 14.8391 6.1819
n ( xi x ) 2
9 4100
or $94.84 to $124.08
1 ( x* x ) 2 1 (128 105)2
spred s 1 14.8391 1 16.525
n ( xi x )2 9 4100
14 - 28
© 2017 Cengage Learning. All Rights Reserved.
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Simple Linear Regression
ŷ * t /2 spred
or $110.69 to $188.85
40. a. 9
b. ŷ = 20.0 + 7.21x
c. 1.3626
Using F table (1 degree of freedom numerator and 7 denominator), p-value is less than .01
b1 B1 .8951 0
b. t 6.01
sb1 .149
Using the t table (8 degrees of freedom), area in tail is less than .005
p-value is less than .01
42 a. ŷ = 80.0 + 50.0x
b. 30
Using F table (1 degree of freedom numerator and 28 denominator), p-value is less than .01
14 - 29
© 2017 Cengage Learning. All Rights Reserved.
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Chapter 14
43. a.
120.0
100.0
2012 Percentage
80.0
60.0
40.0
20.0
0.0
0.0 20.0 40.0 60.0 80.0 100.0 120.0
2011 Percentage
Regression Statistics
Multiple R 0.8702
R Square 0.7572
Adjusted R Square 0.7456
Standard Error 11.5916
Observations 23
ANOVA
df SS MS F Significance F
Regression 1 8798.2391 8798.2391 65.4802 6.85277E-08
Residual 21 2821.6609 134.3648
Total 22 11619.9
14 - 30
© 2017 Cengage Learning. All Rights Reserved.
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Simple Linear Regression
1000
900
800
700
600
Price ($)
500
400
300
200
100
0
45 50 55 60 65 70
Weight (oz)
b. There appears to be a negative linear relationship between the two variables. The heavier helmets
tend to be less expensive.
Analysis of Variance
Model Summary
14 - 31
© 2017 Cengage Learning. All Rights Reserved.
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Chapter 14
Coefficients
Regression Equation
Std
Obs Price Fit Resid Resid
7 900.0 655.2 244.8 3.03 R
R Large residual
xi 70 yi 76
45. a. x 14 y 15.2
n 5 n 5
( xi x )( yi y ) 200 ( xi x ) 2 126
( xi x )( yi y ) 200
b1 1.5873
( xi x ) 2 126
yˆ 7.02 1.59 x
14 - 32
© 2017 Cengage Learning. All Rights Reserved.
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Simple Linear Regression
c.
Residuals 2
-2
-4
-6
0 5 10 15 20 25
x
d. s 2 23.78
1 ( xi x ) 2 1 ( x 14) 2
hi i
n ( xi x ) 2
5 126
e. The standardized residual plot has the same shape as the original residual plot. The
curvature observed indicates that the assumptions regarding the error term may not be
satisfied.
b.
4
3
2
1
Residuals
0
-1
-2
-3
-4
0 2 4 6 8 10
x
14 - 33
© 2017 Cengage Learning. All Rights Reserved.
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Chapter 14
The assumption that the variance is the same for all values of x is questionable. The variance appears
to increase for larger values of x.
yˆ 29.4 1.55 x
Using F table (1 degree of freedom numerator and 5 denominator), p-value is between .01 and .025
Because p-value = .05, we conclude that the two variables are related.
c.
10
0
Residuals
-5
-10
-15
25 35 45 55 65
Predicted Values
d. The residual plot leads us to question the assumption of a linear relationship between x and y. Even
though the relationship is significant at the .05 level of significance, it would be extremely
dangerous to extrapolate beyond the range of the data.
14 - 34
© 2017 Cengage Learning. All Rights Reserved.
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Simple Linear Regression
48. a. yˆ 80 4 x
2
Residuals
-2
-4
-6
-8
0 2 4 6 8 10 12 14
x
Regression Statistics
Multiple R 0.8696
R Square 0.7561
Adjusted R Square 0.7257
Standard Error 78.7819
Observations 10
ANOVA
Significance
df SS MS F F
Regression 1 153961.6801 153961.6801 24.8062 0.0011
Residual 8 49652.7199 6206.5900
Total 9 203614.4
Standard
Coefficients Error t Stat P-value
Intercept -197.9583 187.6950 -1.0547 0.3224
Rent ($) 1.0699 0.2148 4.9806 0.0011
14 - 35
© 2017 Cengage Learning. All Rights Reserved.
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Chapter 14
b.
100
50
0
700 800 900 1000 1100
Residual
‐50
‐100
‐150
‐200
Rent ($)
c. The residual plot leads us to question the assumption of a linear relationship between the average
asking rent and the monthly mortgage. Therefore, even though the relationship is very significant (p-
value = .0011), using the estimated regression equation to make predictions of the monthly mortgage
beyond the range of the data is not recommended.
Analysis of Variance
Model Summary
Coefficients
Regression Equation
y = 66.1 + 0.402 x
14 - 36
© 2017 Cengage Learning. All Rights Reserved.
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Simple Linear Regression
Std
Obs y Fit Resid Resid
1 145.00 120.42 24.58 2.11 R
R Large residual
b.
2.5
2.0
1.5
Standardized Residual
1.0
0.5
0.0
-0.5
-1.0
The standardized residual plot indicates that the observation x = 135, y = 145 may be an outlier;
note that this observation has a standardized residual of 2.11.
14 - 37
© 2017 Cengage Learning. All Rights Reserved.
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Chapter 14
150
145
140
135
130
125
y
120
115
110
105
100
100 110 120 130 140 150 160 170 180
The scatter diagram also indicates that the observation x = 135, y = 145 may be an outlier; the
implication is that for simple linear regression an outlier can be identified by looking at the scatter
diagram.
Analysis of Variance
Model Summary
Coefficients
Regression Equation
y = 13.00 + 0.425 x
14 - 38
© 2017 Cengage Learning. All Rights Reserved.
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Simple Linear Regression
R Large residual
X Unusual X
The standardized residuals are: -1.00, -.41, .01, -.48, .25, .65, -2.00, -2.16
The last two observations in the data set appear to be outliers since the standardized residuals for
these observations are 2.00 and -2.16, respectively.
MINITAB identifies an observation as having high leverage if hi > 6/n; for these data, 6/n =
6/8 = .75. Since the leverage for the observation x = 22, y = 19 is .76, Minitab would identify
observation 8 as a high leverage point. Thus, we conclude that observation 8 is an influential
observation.
c.
30
25
20
15
y
10
0
0 5 10 15 20 25
x
The scatter diagram indicates that the observation x = 22, y = 19 is an influential observation.
14 - 39
© 2017 Cengage Learning. All Rights Reserved.
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Chapter 14
52. a.
120
100
60
40
20
0
0 5 10 15 20 25
Fundraising Expenses (%)
The scatter diagram does indicate potential influential observations. For example, the 22.2%
fundraising expense for the American Cancer Society and the 16.9% fundraising expense for the St.
Jude Children’s Research Hospital look like they may each have a large influence on the slope of the
estimated regression line. And, with a fundraising expense of on 2.6%, the percentage spend on
programs and services by the Smithsonian Institution (73.7%) seems to be somewhat lower than
would be expected; thus, this observeraton may need to be considered as a possible outlier
Analysis of Variance
Model Summary
Coefficients
Regression Equation
14 - 40
© 2017 Cengage Learning. All Rights Reserved.
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Simple Linear Regression
Program
Expenses
Obs (%) Fit Resid Std Resid
3 73.70 88.60 -14.90 -2.13 R
5 71.60 70.62 0.98 0.21 X
R Large residual
X Unusual X
c. The slope of the estimtaed regression equation is -0.917. Thus, for every 1% increase in the amount
spent on fundraising the percentage spent on program expresses will decrease by .917%; in other
words, just a little under 1%. The negative slope and value seem to make sense in the context of this
problem situation.
d. The Minitab output in part (b) indicates that there are two unusual observations:
Although fundraising expenses for the Smithsonian Institution are on the low side as compared to
most of the other super-sized charities, the percentage spent on program expenses appears to be
much lower than one would expect. It appears that the Smithsonian’s administrative expenses are too
high. But, thinking about the expenses of running a large museum like the Smithsonian, the
percetage spent on administrative expenses may not be unreasonable and is just due to the fact that
operating costs for a museum are in general higher than for some other types of organizations. The
very large value of fundraising expenses for the American Cancer Society suggests that this
obervation has a large influence on the estiamted regresion equation. The following Minitab output
shows the results if this observatoin is deleted from the original data.
The y-intercept has changed slightly, but the slope has changed from -.917 to -1.00.
14 - 41
© 2017 Cengage Learning. All Rights Reserved.
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Chapter 14
53. a.
140
120
100
Debt/GDP (%)
80
60
40
20
0
0 100 200 300 400 500 600
Gold Value ($B)
b. There appears to be a positive relationship between the two variables. But, observation 9 (U.S.)
appears to be an observation with high leverage and may be very influential in terms of fitting a
linear model to the data.
Analysis of Variance
Model Summary
Coefficients
Regression Equation
14 - 42
© 2017 Cengage Learning. All Rights Reserved.
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Simple Linear Regression
X Unusual X
d. The Minitab output identifies observation 9 as an observation whose x value gives it large leverage.
e. Looking at the scatter diagram in part (a) it looks like observation 9 will have a lot of influence on
the estimated regression equation. To investigate this we can simply drop the observation from the
data set and fit a new estimated regression equation. The Minitab output we obtained follows.
Analysis of Variance
Model Summary
Coefficients
Regression Equation
Note that the slope of the estimated regression equation is now .342 as compared to a value of .123
when this observation is included. Thus, we see that this observation has a big impact on the value of
the slope of the fitted line and hence we would say that it is an influential observation.
14 - 43
© 2017 Cengage Learning. All Rights Reserved.
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Chapter 14
54. a.
2,500
2,000
Value ($ millions)
1,500
1,000
500
0
0 100 200 300 400 500
Revenue ($ millions)
The scatter diagram does indicate potential outliers and/or influential observations. For example, the
New York Yankees have both the hightest revenue and value, and appears to be an influential
observation. The Los Angeles Dodgers have the second highest value and appears to be an outlier.
Regression Statistics
Multiple R 0.9062
R Square 0.8211
Adjusted R Square 0.8148
Standard Error 165.6581
Observations 30
ANOVA
df SS MS F Significance F
Regression 1 3527616.598 3527616.6 128.5453 5.616E-12
Residual 28 768392.7687 27442.599
Total 29 4296009.367
Upper
Coefficients Standard Error t Stat P-value Lower 95% 95%
Intercept -601.4814 122.4288 -4.9129 3.519E-05 -852.2655 -350.6973
Revenue ($
millions) 5.9271 0.5228 11.3378 5.616E-12 4.8562 6.9979
Thus, the estimated regression equation that can be used to predict the team’s value given the value
of annual revenue is ŷ = -601.4814 + 5.9271 Revenue.
14 - 44
© 2017 Cengage Learning. All Rights Reserved.
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Simple Linear Regression
c. The Standard Residual value for the Los Angeles Dodgers is 4.7 and should be treated as an outlier.
To determine if the New York Yankees point is an influential observation we can remove the
observation and compute a new estimated regression equation. The results show that the estimated
regresssion equation is ŷ = -449.061 + 5.2122 Revenue. The following two scatter diagrams
illustrate the small change in the estimated regression equation after removing the observation for
the New York Yankees. These scatter diagrams show that the effect of the New York Yankees
observation on the regression results is not that dramatic.
14 - 45
© 2017 Cengage Learning. All Rights Reserved.
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Chapter 14
55. No. Regression or correlation analysis can never prove that two variables are causally related.
56. The estimate of a mean value is an estimate of the average of all y values associated with the same x.
The estimate of an individual y value is an estimate of only one of the y values associated with a
particular x.
57. The purpose of testing whether 1 0 is to determine whether or not there is a significant
relationship between x and y. However, rejecting 1 0 does not necessarily imply a good fit. For
example, if 1 0 is rejected and r2 is low, there is a statistically significant relationship between x
and y but the fit is not very good.
58. a.
1420
1400
1380
1360
S&P 500
1340
1320
1300
1280
1260
12200 12400 12600 12800 13000 13200 13400
DJIA
Analysis of Variance
Model Summary
Coefficients
14 - 46
© 2017 Cengage Learning. All Rights Reserved.
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Simple Linear Regression
Regression Equation
c. Using the F test, the p-value corresponding to F = 239.89 is .000. Because the p-value =.05, we
reject H 0 : 1 0 ; there is a significant relationship.
d. With R-Sq = 94.9%, the estimated regression equation provided an excellent fit.
f. The DJIA is not that far beyond the range of the data. With the excellent fit provided by the
estimated regression equation, we should not be too concerned about using the estimated regression
equation to predict the S&P500.
59. a.
350.0
300.0
Selling Price ($1,000s)
250.0
200.0
150.0
100.0
50.0
0.0
0.00 0.50 1.00 1.50 2.00 2.50 3.00 3.50
Size (1,000's sq. ft.)
The scatter diagram suggests that there is a linear relationship between size and selling price and that
as size increases, selling price increases.
14 - 47
© 2017 Cengage Learning. All Rights Reserved.
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Chapter 14
e. The estimated regression equation should provide a good estimate because r2 = 0.897.
f. This estimated equation might not work well for other cities. Housing markets are also driven by
other factors that influence demand for housing, such as job market and quality-of-life factors. For
example, because of the existence of high tech jobs and its proximity to the ocean, the house prices
in Seattle, Washington might be very different from the house prices in Winston, Salem, North
Carolina.
14 - 48
© 2017 Cengage Learning. All Rights Reserved.
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Simple Linear Regression
60. a.
The scatter diagram indicates a positive linear relationship between the two variables. Online
universities with higher retention rates tend to have higher graduation rates.
Analysis of Variance
Model Summary
Coefficients
Regression Equation
14 - 49
© 2017 Cengage Learning. All Rights Reserved.
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Chapter 14
R Large residual
X Unusual X
d. The estimated regression equation is able to explain 44.9% of the variability in the graduation rate
based upon the linear relationship with the retention rate. It is not a great fit, but given the type of
data, the fit is reasonably good.
e. In the Minitab output in part (b), South University is identified as an observation with a large
standardized residual. With a retention rate of 51% it does appear that the graduation rate of 25% is
low as compared to the results for other online universities. The president of South University should
be concerned after looking at the data. Using the estimated regression equation, we estimate that the
gradation rate at South University should be 25.4 + .285(51) = 40%.
f. In the Minitab output in part (b), the University of Phoenix is identified as an observation whose x
value gives it large influence. With a retention rate of only 4%, the president of the University of
Phoenix should be concerned after looking at the data.
Analysis of Variance
Model Summary
Coefficients
Regression Equation
14 - 50
© 2017 Cengage Learning. All Rights Reserved.
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Simple Linear Regression
Variable Setting
Usage 30
b. Since the p-value corresponding to F = 47.62 = .000 < = .05, we reject H0: 1 = 0.
Analysis of Variance
Model Summary
Coefficients
Regression Equation
Variable Setting
Speed 50
14 - 51
© 2017 Cengage Learning. All Rights Reserved.
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Chapter 14
b. Since the p-value corresponding to F = 11.33 = .028 < = .05, the relationship is significant.
c. r 2 = .739; a good fit. The least squares line explained 73.9% of the variability in the number of
defects.
d. Using the Minitab output in part (a), the 95% confidence interval is 12.294 to 17.2712.
63. a.
9
8
7
6
5
Days
4
3
2
1
0
0 5 10 15 20
Distance
There appears to be a negative linear relationship between distance to work and number of days
absent.
Analysis of Variance
Model Summary
Coefficients
14 - 52
© 2017 Cengage Learning. All Rights Reserved.
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Simple Linear Regression
Regression Equation
Variable Setting
Distance 5
There is a significant relationship between the number of days absent and the distance to work.
d. r2 = .711. The estimated regression equation explained 71.1% of the variability in y; this is a
reasonably good fit.
e. The 95% confidence interval is 5.19502 to 7.5586 or approximately 5.2 to 7.6 days.
Analysis of Variance
Model Summary
Coefficients
Regression Equation
Variable Setting
Age 4
b. Since the p-value corresponding to F = 54.75 is .000 < = .05, we reject H0: 1 = 0.
Analysis of Variance
Model Summary
Coefficients
Regression Equation
Variable Setting
Hours 95
14 - 54
© 2017 Cengage Learning. All Rights Reserved.
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Simple Linear Regression
b. Since the p-value corresponding to F = 57.42 is .000 < = .05, we reject H0: 1 = 0.
c. 84.65 points
Analysis of Variance
Model Summary
Coefficients
Regression Equation
b. Since the p-value = 0.029 is less than = .05, the relationship is significant.
c. r2 = .470. The least squares line does not provide a very good fit.
14 - 55
© 2017 Cengage Learning. All Rights Reserved.
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Chapter 14
Analysis of Variance
Model Summary
Coefficients
Regression Equation
Variable Setting
Adjusted_Gross Income 35000
b. Since the p-value = 0.038 is less than = .05, the relationship is significant.
c. r2 = .217. The least squares line does not provide a very good fit.
14 - 56
© 2017 Cengage Learning. All Rights Reserved.
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Simple Linear Regression
68. a.
18.0
Price ($1000s) 16.0
14.0
12.0
10.0
8.0
6.0
4.0
0 20 40 60 80 100 120
Miles (1000s)
b. There appears to be a negative relationship between the two variables that can be approximated by a
straight line. An argument could also be made that the relationship is perhaps curvilinear because at
some point a car has so many miles that its value becomes very small.
Analysis of Variance
Model Summary
Coefficients
Regression Equation
e. r 2 = .5387; a reasonably good fit considering that the condition of the car is also an important factor
in what the price is.
14 - 57
© 2017 Cengage Learning. All Rights Reserved.
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Chapter 14
f. The slope of the estimated regression equation is -.0558. Thus, a one-unit increase in the value of x
coincides with a decrease in the value of y equal to .0558. Because the data were recorded in
thousands, every additional 1000 miles on the car’s odometer will result in a $55.80 decrease in the
predicted price.
g. The predicted price for a 2007 Camry with 60,000 miles is ŷ = 16.47 -.0588(60) = 12.942 or
$12,942. Because of other factors, such as condition and whether the seller is a private party or a
dealer, this is probably not the price you would offer for the car. But, it should be a good starting
point in figuring out what to offer the seller.
14 - 58
© 2017 Cengage Learning. All Rights Reserved.
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.