Chapter14
Chapter14
Chapter 14
Slide
1
Overview
Simple Linear Regression Model
Coefficient of Determination
Model Assumptions
Slide
3
Simple Linear Regression
Slide
4
Simple Linear Regression Model
Slide
5
Simple Linear Regression Model
y = b0 + b1x +e
where:
b0 and b1 are called parameters of the model,
e is a random variable called the error term.
Slide
6
Simple Linear Regression Equation
E(y) = 0 + 1x
Slide
7
Simple Linear Regression Equation
E(y)
Regression line
Intercept Slope b1
b0
is positive
Slide
8
Simple Linear Regression Equation
E(y)
Intercept
b0Regression line
Slope b1
is negative
Slide
9
Simple Linear Regression Equation
No Relationship
E(y)
Slide
10
Estimated Simple Linear Regression
Equation
The estimated simple linear regression
equation
ŷ b0 b1x
Slide
11
Estimation Process
Estimated
b0 and b1 Regression Equation
provide estimates of ŷ b0 b1x
b0 and b1 Sample Statistics
b0, b1
Slide
12
Least Squares Method
Slide
13
Least Squares Method
min (yi y i )2
where:
yi = observed value of the dependent variable
for the ith observation
yi =^estimated value of the dependent variable
for the ith observation
Slide
14
Least Squares Method
b (x x)(y y)
i i
(x x)
1 2
i
where:
xi = value of independent variable for ith
observation
yi = value of dependent variable for ith
_ observation
x = mean value for independent variable
_
y = mean value for dependent variable
Slide
15
Least Squares Method
Slide
16
Simple Linear Regression
Slide
17
Simple Linear Regression
Number of Number of
TV Ads (x) Cars Sold (y)
1 14
3 24
2 18
1 17
3 27
Sx = 10 Sy = 100
x 2 y 20
Slide
18
Estimated Regression Equation
Slide
19
Coefficient of Determination
Slide
20
Coefficient of Determination
SST = SSR +
SSE
i
( y y )2
i
( ˆ
y y )2
i i
( y ˆ
y )2
where:
SST = total sum of squares
SSR = sum of squares due to regression
SSE = sum of squares due to error
Slide
21
Coefficient of Determination
r2 = SSR/SST
where:
SSR = sum of squares due to regression
SST = total sum of squares
Slide
22
Coefficient of Determination
Slide
23
Sample Correlation Coefficient
where:
b1 = the slope of the estimated regression
equation yˆ b0 b1 x
Slide
24
Sample Correlation Coefficient
rxy (sign of b1 ) r 2
yˆ 10 5 x is “+”.
The sign of b1 in the equation
rxy =+ .8772
rxy = +.9366
Slide
25
Model Assumptions
Slide
26
Assumptions About the Error Term e
Slide
27
Testing for Significance
Slide
28
Testing for Significance
Slide
29
Testing for Significance
An Estimate of s 2
s 2 = MSE = SSE/(n - 2)
where:
SSE ( yi yˆ i ) 2 ( yi b0 b1 xi ) 2
Slide
30
Testing for Significance
An Estimate of s
• To estimate s we take the square root of s 2.
• The resulting s is called the standard error of
the estimate.
SSE
s MSE
n 2
Slide
31
Testing for Significance: t Test
Hypotheses
H 0: 1 0
H a: 1 0
Test Statistic
b1 s
t where sb1
sb1 2
(xi x)
Slide
32
Testing for Significance: t Test
Rejection Rule
where:
t is based on a t distribution
with n - 2 degrees of freedom
Slide
33
Testing for Significance: t Test
b1
3. Select the test statistic.t
sb1
Slide
34
Testing for Significance: t Test
Slide
35
Confidence Interval for 1
Slide
36
Confidence Interval for 1
Slide
37
Confidence Interval for 1
Rejection Rule
Reject H0 if 0 is not included in
the confidence interval for 1.
95% Confidence Interval for 1
b1 t / 2=sb15 +/- 3.182(1.08) = 5 +/- 3.44
or 1.56 to 8.44
Conclusion
0 is not included in the confidence interval.
Reject H0
Slide
38
Testing for Significance: F Test
Hypotheses
H 0: 1 0
H a: 1 0
Test Statistic
F = MSR/MSE
Slide
39
Testing for Significance: F Test
Rejection Rule
Reject H0 if
p-value < a
or F > F
where:
F is based on an F distribution with
1 degree of freedom in the numerator and
n - 2 degrees of freedom in the denominator
Slide
40
Testing for Significance: F Test
a = .05
2. Specify the level of significance.
Slide
41
Testing for Significance: F Test
Slide
43
Using the Estimated Regression Equation
for Estimation and Prediction
Slide
44
Using the Estimated Regression Equation
for Estimation and Prediction
A confidence interval is an interval estimate of the
mean value of y for a given value of x.
A prediction interval is used whenever we want to
predict an individual value of y for a new observation
corresponding to a given value of x.
The margin of error is larger for a prediction interval.
Slide
45
Using the Estimated Regression Equation
for Estimation and Prediction
Confidence Interval Estimate of E(y*)
ˆ* t / 2syˆ*
y
Slide
46
Point Estimation
Slide
47
Confidence Interval for E(y*)
*
ŷ of
Estimate of the Standard Deviation
1 (x* x)2
syˆ* s
n (xi x)2
1 (3 2)2
syˆ* 2.16025
5 (1 2)2 (3 2)2 (2 2)2 (1 2)2 (3 2)2
1 1
syˆ* 2.16025 1.4491
5 4
Slide
48
Confidence Interval for E(y*)
25 + 3.1824(1.4491)
25 + 4.61
Slide
49
Prediction Interval for y*
1 1
spred 2.16025 1
5 4
spred 2.16025(1.20416) 2.6013
Slide
50
Prediction Interval for y*
25 + 3.1824(2.6013)
25 + 8.28
Slide
51
Residual Analysis
Slide
52
Residual Analysis
yi yˆi
Slide
53
Residual Plot Against x
Slide
54
Residual Plot Against x
y yˆ
Good Pattern
Residual
Slide
55
Residual Plot Against x
y yˆ
Nonconstant Variance
Residual
Slide
56
Residual Plot Against x
y yˆ
Model Form Not Adequate
Residual
Slide
57
Residual Plot Against x
Residuals
Observation Predicted Cars Sold Residuals
1 15 -1
2 25 -1
3 20 -2
4 15 2
5 25 2
Slide
58
Residual Plot Against x
1
0
-1
-2
-3
0 1 2 3 4
TV Ads
Slide
59
Standardized Residuals
yi yˆi
syi yˆi
1 (xi x)2
hi
n (xi x)2
Slide
60
Standardized Residual Plot
Slide
61
Standardized Residual Plot
Standardized Residuals
Standardized
Observation Predicted y Residual Residual
1 15 -1 -0.5345
2 25 -1 -0.5345
3 20 -2 -1.0690
4 15 2 1.0690
5 25 2 1.0690
Slide
62
Standardized Residual Plot
Standardized Residual
Plot A B C D
1.5
28
Standard Residuals
29 1
RESIDUAL OUTPUT
30 0.5
31 Observation Predicted Y Residuals
Standard Residuals
0
32 1 15 -1 -0.534522
33 -0.5 0 2
10
25
20 30
-1 -0.534522
34 -1 3 20 -2 -1.069045
35 4 15 2 1.069045
-1.5
36 5 25 2 1.069045
Cars Sold
37
Slide
63
Standardized Residual Plot
Slide
64
Outliers and Influential Observations
Detecting Outliers
• An outlier is an observation that is unusual
in comparison with the other data.
• Minitab classifies an observation as an
outlier if its standardized residual value is <
• -2 orstandardized
This > +2. residual rule sometimes
fails to identify an unusually large
observation as being an outlier.
• This rule’s shortcoming can be
circumvented by using studentized deleted
• residuals.
The |i th studentized deleted residual| will
be larger than the |i th standardized
residual|.
Slide
65