Linear Regression Analysis
Linear Regression Analysis
JOHN
LOUCKS
St. Edward’s
University
y=β 0 + β 1x +ε
where:
β 0 and β 1 are called parameters of the model,
ε is a random variable called the error term.
E(y) = β 0 + β 1x
E(y)
Regression line
Intercept Slope β 1
β 0
is positive
E(y)
Slope β 1
is negative
■ No Relationship
E(y)
Regression line
Intercept
β 0
Slope β 1
is 0
Estimated
b0 and b1 Regression Equation
provide estimates of ŷ = b0 + b1x
β 0 and β 1 Sample Statistics
b0, b1
min ∑ (yi − y i )2
where:
yi = observed value of the dependent variable
for the ith observation
yi =^estimated value of the dependent variable
for the ith observation
b = ∑ (x − x)(y − y)
i i
∑ (x − x)
1 2
i
where:
xi = value of independent variable for ith
observation
yi = value of dependent variable for ith
_ observation
x = mean value for independent variable
_
y = mean value for dependent variable
Number of Number of
TV Ads (x) Cars Sold (y)
1 14
3 24
2 18
1 17
3 27
Σ x = 10 Σ y = 100
x=2 y = 20
b1 = ∑(x − x)(y − y)
i
= i 20
= 5
∑(x −x) i
2
4
■ y-Intercept for the Estimated Regression Equation
30
25
20
Cars Sold
y = 5x + 10
15
10
5
0
0 1 2 3 4
TV Ads
SST = SSR +
SSE
∑ i
( y − y 2
) = ∑ i
( ˆ
y −y 2
) + ∑ i i
( y −yˆ 2
)
where:
SST = total sum of squares
SSR = sum of squares due to regression
SSE = sum of squares due to error
r2 = SSR/SST
where:
SSR = sum of squares due to regression
SST = total sum of squares
where:
b1 = the slope of the estimated regression
equation yˆ = b0 + b1 x
rxy = (sign of b1 ) r 2
yˆ = 10 + 5 x is “+”.
The sign of b1 in the equation
rxy =+ .8772
rxy = +.9366
1.
1. The error εε
The error is
is aa random
random variable
variable with
with mean
mean of
of zero.
zero.
2.
2. The
The variance of εε ,, denoted
variance of by σσ 22,, is
denoted by is the
the same
same for
for
all
all values
values of
of the
the independent
independent variable.
variable.
3.
3. The
The values of εε
values of are
are independent.
independent.
4.
4. The error εε
The error is
is aa normally
normally distributed
distributed random
random
variable.
variable.
To
To test
test for
for aa significant
significant regression
regression relationship,
relationship, we
we
must
must conduct
conduct aa hypothesis
hypothesis test
test to
to determine
determine whether
whether
the
the value of ββ 11 is
value of is zero.
zero.
Two
Two tests
tests are
are commonly
commonly used:
used:
t Test and F Test
Both
Both the
the tt test
test and
and FF test
test require
require an
an estimate of σσ
estimate of ,,
22
the
the variance
variance ofof εε in
in the
the regression
regression model.
model.
■ An Estimate of σ 2
The mean square error (MSE) provides the estimate
of σ 2, and the notation s2 is also used.
s 2 = MSE = SSE/(n − 2)
where:
SSE = ∑ ( yi − yˆ i ) 2 = ∑ ( yi − b0 − b1 xi ) 2
■An Estimate of σ
• To estimate σ we take the square root of σ 2
.
• The resulting s is called the standard error of
the estimate.
SSE
s = MSE =
n−2
■ Hypotheses
H 0: β 1 = 0
H a: β 1 ≠ 0
■ Test Statistic
b1 s
t= where sb1 =
sb1 Σ(xi − x)
2
■ Rejection Rule
where:
tα/ 2 is based on a t distribution
with n - 2 degrees of freedom
b1
3. Select the test statistic.t =
sb1
■ Rejection Rule
Reject H0 if 0 is not included in
the confidence interval for β 1.
■ 95% Confidence Interval for β 1
b1 ± tα / 2=sb15 +/- 3.182(1.08) = 5 +/- 3.44
or 1.56 to 8.44
■ Conclusion
0 is not included in the confidence interval.
Reject H0
■ Hypotheses
H 0: β 1 = 0
H a: β 1 ≠ 0
■ Test Statistic
F = MSR/MSE
■ Rejection Rule
Reject H0 if
p-value < α
or F > Fα
where:
Fα is based on an F distribution with
1 degree of freedom in the numerator and
n - 2 degrees of freedom in the denominator
α
2. Specify the level of significance. = .05
yp ± tα / 2sind
where:
confidence coefficient is 1 - α and
tα /2 is based on a t distribution
with n - 2 degrees of freedom
■ yˆpof
Estimate of the Standard Deviation
1 (xp − x)2
syˆp = s +
n ∑ (xi − x)2
1 (3− 2)2
syˆp = 2.16025 +
5 (1− 2)2 + (3− 2)2 + (2 − 2)2 + (1− 2)2 + (3− 2)2
1 1
syˆp = 2.16025 + = 1.4491
5 4
25 + 3.1824(1.4491)
25 + 4.61
sind = s 1+ +
n ∑ (xi − x)2
1 1
syˆp = 2.16025 1+ +
5 4
syˆp = 2.16025(1.20416) = 2.6013
yp ± tα / 2sind
25 + 3.1824(2.6013)
25 + 8.28
yi − yˆi
y − yˆ
Good Pattern
Residual
y − yˆ
Nonconstant Variance
Residual
y − yˆ
Model Form Not Adequate
Residual
■ Residuals
Observation Predicted Cars Sold Residuals
1 15 -1
2 25 -1
3 20 -2
4 15 2
5 25 2
1
0
-1
-2
-3
0 1 2 3 4
TV Ads