Simple Linear Regression 69
Simple Linear Regression 69
Regression (SLR)
Types of Correlation
1
Y
1.0
o
X
Y o 1 X
1
Y
1.0
o
X
Y
X
ε
Y
X
Fitting data to a linear model
Yi o 1 X i i
Model line: Y 0 1 X
Residual (ε) = Y Y
=
Sum of squares of residuals min (Y
(Y
Y ) 2
Y ) 2
S xy xy
b1 2
S xx x
b0 Y b1 X
Required Statistics
n number of observatio ns
X
X
n
Y
Y
n
Descriptive Statistics
X X
n
2 S xx
Var ( X ) i 1
n 1
S yy (SST )
Y Y
n
2
Var (Y ) i 1
n 1
X X Y Y
n
S xy
Covar ( X , Y ) i 1
n 1
Regression Statistics
SST (Y Y ) 2
SSR (Y Y ) 2
SSE (Y Y ) 2
Variance to be
explained by predictors
(SST)
Y
X1
Variance
Y
explained by X1 Variance NOT
explained by X1
(SSR)
(SSE)
Regression Statistics
SSR
R 2
SST
Coefficient of Determination
to judge the adequacy of the regression model
Regression Statistics
R R 2
S xy xy
R
S xx S yy x y
Correlation
measures the strength of the linear association between
two variables.
Regression Statistics
Standard Error for the regression model
S e S ˆ
2
e
2
SSE SSE (Y Y ) 2
S
2
n2
e
S e MSE
2
ANOVA
H 0 : 1 0
H A : 1 0
df SS MS F P-value
H 0 : i 0
H1 : i 0
bi i
t( n k 1)
Sbi
Hypotheses Tests for Regression
Coefficients
H 0 : 1 0
H A : 1 0
b1 1 b1 1
t( n k 1)
S e (b1 ) 2
Se
S xx
Confidence Interval on Regression
Coefficients
Se2 Se2
b1 ta / 2,( n k 1) 1 b1 ta / 2,( n k 1)
S xx S xx
b0 0 b0 0
t( n k 1)
Se (b0 ) 1 X 2
S
2
e
n S xx
Confidence Interval on Regression
Coefficients
1 X 2
1 X 2
b0 ta / 2,( nk 1) Se
2
0 b0 ta / 2,( nk 1) Se
2
n S xx n S xx
H0 : 0
HA : 0
R n2
T0
1 R 2
i
Yi
Diagnostic Tests For Regressions
Residuals for a non-linear fit
i
Yi
Diagnostic Tests For Regressions
Residuals for a quadratic function
or polynomial
i
Yi
Diagnostic Tests For Regressions
Residuals are not homogeneous
(increasing in variance)
i
Yi
Regression – important points
X
Y
X
Regression – important points
X
Y
X
Assumptions of Regression
X
Assumptions of Regression
X
Assumptions of Regression
Y o 1 X1 2 X 2 ... p X p
Common variance
explained by X1 and X2
Unique variance
explained by X2
X2
X1
Y
Unique variance
Variance NOT
explained by X1
explained by X1 and X2
A “good” model
X1 X2
Y
Y o 1 X1 2 X 2 ... p X p
( X ' X ) X 'Y 1
Predicted Values:
Y X
Residuals:
Y Y
Regression Statistics
How good is our model?
SST (Y Y ) 2
SSR (Y Y ) 2
SSE (Y Y ) 2
Regression Statistics
SSR
R 2
SST
Coefficient of Determination
to judge the adequacy of the regression model
Regression Statistics
n 1
R 2
1 (1 R )
2
n k 1
adj
n = sample size
k = number of independent variables
S e S ˆ
2
e
2
SSE SSE (Y Y ) 2
S
2
n k 1
e
S e MSE
2
ANOVA
H 0 : 1 2 ... k 0
H A : i 0 at least one!
df SS MS F P-value
H 0 : i 0
H1 : i 0
bi i
t( n k 1)
Sbi
Hypotheses Tests for Regression
Coefficients
H 0 : i 0
H A : i 0
b1 i bi i
t( n k 1)
S e (bi ) 2
S e Cii S 2
e
S xx
Confidence Interval on Regression
Coefficients
i
X Residual Plot
10
Residuals
5
0
-5 0 2 4 6 8
X
Xi
Standardized Residuals
ei
di
2
S e
Standard Residuals
2.5
2
1.5
1
0.5
0
-0.5 0 5 10 15 20 25
-1
-1.5
-2
Model Selection
- Forward selection
The „best‟ predictor variables are entered, one by one.
- Backward elimination
The „worst‟ predictor variables are eliminated, one by one.
Forward Selection
Backward
Elimination
Model Selection: The General Case
H 0 : q 1 q 2 ... k 0
H1 : at least one in not zero
Reject H0 if : F Fa ,k q ,n k 1
Multicolinearity
The degree of correlation between Xs.
1
VIF ( i ) Cii
1 Ri2
Model Evaluation
n
PRESS ( yi y(i ) )
2
i 1