SimpleLinearRegression PDF
SimpleLinearRegression PDF
Carlos Carvalho
The University of Texas McCombs School of Business
mccombs.utexas.edu/faculty/carlos.carvalho/teaching
1
Today’s Plan
2
Linear Prediction
Ŷi = b0 + b1 Xi
criterion are:
sY
b1 = corr (X , Y ) × b0 = Ȳ − b1 X̄
sX
where,
v v
u n u n
uX 2 uX 2
sY = t Yi − Ȳ and sX = t Xi − X̄
i=1 i=1
4
Review: Covariance and Correlation
Correlation and Covariance
Measure the direction and strength of the linear
, -"./01"$23"$direction relationship 56$23"$()4".1
.4*$strength
1"(.2)54/3)7$8"29""4$23"$:.1).8("/$;$.4*$<
between variables Y and X
The direction i
the sign of the c
Applied Regression
P n Analysis
Carlos M. Carvalhoi=1 (Y − Ȳ )(Xi − X̄ )
i
Cov (Y , X ) =
n−1 5
Correlation and Covariance
cov(X , Y ) cov(X , Y )
corr(X , Y ) = p =
var(X )var(Y ) sd(X )sd(Y )
6
Correlation
3
corr = 1 corr = .5
2
1
1
0
0
-1
-1
-2
-2
-3
-3
-3 -2 -1 0 1 2 3 -3 -2 -1 0 1 2 3
3
3
corr = .8 corr = -.8
2
2
1
1
0
0
-1
-1
-2
-2
-3
-3
-3 -2 -1 0 1 2 3 -3 -2 -1 0 1 2 3
7
Correlation
Only measures linear relationships:
corr(X , Y ) = 0 does not mean the variables are not related!
20
0
15
-2
10
-4
5
-6
0
-8
-3 -2 -1 0 1 2 0 5 10 15 20
8
Back to Least Squares
1. Intercept:
b0 = Ȳ − b1 X̄ ⇒ Ȳ = b0 + b1 X̄
2. Slope:
sY
b1 = corr (X , Y ) ×
sX
From now on, terms “fitted values” (Ŷi ) and “residuals” (ei ) refer
to those obtained from the least squares line.
The fitted values and residuals have some special properties. Lets
look at the housing data analysis to figure out what these
properties are...
10
The Fitted Values and X
160
140
Fitted Values
120
100
corr(y.hat, x) = 1
80
11
The Residuals and X
30
corr(e, x) = 0
20
mean(e) = 0
Residuals
10
0
-10
-20
12
Why?
What is the intuition for the relationship between Ŷ and e and X ?
Lets consider some “crazy”alternative line:
160
140
Crazy line: 10 + 50 X
120
Y
100
13
Fitted Values and Residuals
This is a bad fit! We are underestimating the value of small houses
and overestimating the value of big houses.
30
corr(e, x) = -0.7
20
mean(e) = 1.8
Crazy Residuals
10
0
-10
-20
14
Fitted Values and Residuals
In Summary: Y = Ŷ + e where:
I Ŷ is “made from X ”; corr(X , Ŷ ) = 1.
I e is unrelated to X ; corr(X , e) = 0.
15
Another way to derive things
The intercept:
n n
1X 1X
ei = 0 ⇒ (Yi − b0 − b1 Xi )
n i=1 n i=1
⇒ Ȳ − b0 − b1 X̄ = 0
⇒ b0 = Ȳ − b1 X̄
16
Another way to derive things
The slope:
n
X
corr(e, X ) = ei (Xi − X̄ ) = 0
i=1
n
X
= (Yi − b0 − b1 Xi )(Xi − X̄ )
i=1
n
X
= (Yi − Ȳ − b1 (Xi − X̄ ))(Xi − X̄ )
i=1
Pn
i=1 (Xi − X̄ )(Yi − Ȳ ) sy
⇒ b1 = Pn = rxy
i=1 (Xi − X̄ )
2 sx
17
Decomposing the Variance
This leads to
n
X n
X n
X
(Yi − Ȳ )2 = (Ŷi − Ȳ )2 + ei2
i=1 i=1 i=1
18
Decomposing the Variance – ANOVA Tables
20
Applied Regression Analysis – Fall 2008
A Goodness of Fit Measure: R 2
SSR SSE
R2 = =1−
SST SST
I 0 < R 2 < 1.
I The closer R 2 is to 1, the better the fit.
21
A Goodness of Fit Measure: R 2
Pn
2 (Ŷi − Ȳ )2
R = Pi=1
n
(Yi − Ȳ )2
Pni=1 2
i=1 (b0 + b1 Xi − b0 − b1 X̄ )
= Pn 2
i=1 (Yi − Ȳ )
b12 ni=1 (Xi − X̄ )2 b12 sx2
P
2
= Pn 2
= 2
= rxy
i=1 (Y i − Ȳ ) s y
22
BackBack
to the House
to the HouseData
Data
''/
''-
''.
Applied Regression Analysis
!""#$%%&$'()*"$+,
Carlos M. Carvalho
23
Prediction and the Modelling Goal
Ŷ = f (X ) = b0 + b1 X
24
Prediction and the Modelling Goal
Ŷ is not going to be a perfect prediction.
We need to devise a notion of forecast accuracy.
25
Prediction and the Modelling Goal
26
Prediction and the Modelling Goal
27
The Simple Linear Regression Model
ε ∼ N(0, σ 2 )
28
Independent Normal Additive Error
I E [ε] = 0 ⇔ E [Y | X ] = β0 + β1 X
(E [Y | X ] is “conditional expectation of Y given X ”).
I Many things are close to Normal (central limit theorem).
I MLE estimates for β’s are the same as the LS b’s.
I It works! This is a very robust model for the world.
29
The Regression Model and our House Data
30
Conditional Distributions
The conditional distribution for Y given X is Normal:
Y |X ∼ N(β0 + β1 X , σ 2 ).
σ controls dispersion:
31
Conditional vs Marginal Distributions
I Mean is E [Y |X ] = E [β0 + β1 X + ε] = β0 + β1 X .
I Variance is var(β0 + β1 X + ε) = var(ε) = σ 2 .
32
Prediction Intervals with the True Model
You are told (without looking at the data) that
β0 = 40; β1 = 45; σ = 10
and you are asked to predict price of a 1500 square foot house.
Y = 40 + 45(1.5) + ε
= 107.5 + ε
Y ∼ N(107.5, 102 )
33
Prediction Intervals with the True Model
The model says that the mean value of a 1500 sq. ft. house is
$107,500 and that deviation from mean is within ≈ $20,000.
34
Summary of Simple Linear Regression
Assume that all observations are drawn from our regression model
and that errors on those observations are independent.
The model is
Yi = β0 + β1 Xi + εi
35
Key Characteristics of Linear Regression Model
I Mean of Y is linear in X .
I Error terms (deviations from line) are normally distributed
(very few deviations are more than 2 sd away from the
regression mean).
I Error terms have constant variance.
36
Break
Back in 15 minutes...
37
Recall: Estimation for the SLR Model
SLR assumes every observation in the dataset was generated by
the model:
Yi = β0 + β1 Xi + εi
β̂0 = b0 = Ȳ − b1 X̄
38
Estimation for the SLR Model
39
Estimation of Error Variance
iid
Recall that εi ∼ N(0, σ 2 ), and that σ drives the width of the
prediction intervals:
40
Estimation of Error Variance
41
Degrees of Freedom
Pn
For example, consider SST = i=1 (Yi − Ȳ )2 :
I If n = 1, Ȳ = Y1 and SST = 0: since Y1 is “used up”
estimating the mean, we haven’t observed any variability!
I For n > 1, we’ve only had n − 1 chances for deviation from
the mean, and we estimate sy2 = SST /(n − 1).
42
Estimation of Error Variance
Estimation of V2
!,"-"$).$s )/$0,"$123"($4506507
Where is s in the Excel output?
43
Sampling Distribution of Least Squares Estimates
44
Sampling Distribution of Least Squares Estimates
45
Sampling Distribution of Least Squares Estimates
46
Sampling Distribution of Least Squares Estimates
47
Review: Sampling Distribution of Sample Mean
Step back for a moment and consider the mean for an iid sample
of n observations of a random variable {X1 , . . . , Xn }
1 P
I E (X̄ ) = n E (Xi ) = µ
σ2
var(X̄ ) = var n1 1
P P
I Xi = n2
var (Xi ) =
n
2
If X is normal, then X̄ ∼ N µ, σn .
49
Oracle vs SAP
50
Oracle vs SAP
51
Central Limit Theorem
Simple CLT states that for iid random variables, X , with mean µ
and variance σ 2 , the distribution of the sample mean becomes
normal as the number of observations, n, gets large.
σ2
That is, X̄ →n N(µ, ), and sample averages tend to be normally
n
distributed in large samples.
52
Central Limit Theorem
0.4
0.2
0.0
0 1 2 3 4 5
X
E [X ] = 1 and var(X ) = 1.
53
Central Limit Theorem
150
50
0
0 1 2 3 4 5
54
Central Limit Theorem
100
50
0
55
Central Limit Theorem
150
100
50
0
56
Central Limit Theorem
100
50
0
57
Central Limit Theorem
100
50
0
58
Sampling Distribution of b1
59
Sampling Distribution of b1
σ2 σ2
var(b1 ) = P =
(Xi − X̄ )2 (n − 1)sx2
Three Factors:
sample size (n), error variance (σ 2 = σε2 ), and X -spread (sx ).
60
Sampling Distribution of b0
X̄ 2
1
σb20 = var(b0 ) = σ 2
+
n (n − 1)sx2
61
The Importance of Understanding Variation
62
The Importance of Understanding Variation
63
Estimated Variance
64
Normal and Student’s t
For example:
I Ȳ ∼ tn−1 (µ, sy2 /n).
b0 ∼ tn−2 β0 , sb20 and b1 ∼ tn−2 β1 , sb21
I
65
Standardized Normal and Student’s t
bj − βj bj − βj
∼ N(0, 1) =⇒ ∼ tn−2 (0, 1)
σbj sbj
66
Testing and Confidence Intervals (in 3 slides)
Suppose Zn−p is distributed tn−p (0, 1). A centered interval is
67
Confidence Intervals
bj − βj
1 − α = P(−tn−p,α/2 < < tn−p,α/2 )
sbj
= P(bj − tn−p,α/2 sbj < βj < bj + tn−p,α/2 sbj )
68
Testing
Similarly, suppose that assuming bj ∼ tn−p (βj , sbj ) for our sample
bj leads to (recall Zn−p ∼ tn−p (0, 1))
bj − β j bj − βj
P Zn−p < − + P Zn−p >
= ϕ.
sbj sbj
69
More Detail... Confidence Intervals
70
More Detail... Testing
71
Hypothesis Testing
Our hypothesis test will either reject or not reject the null
hypothesis (the default if our claim is not true).
72
Hypothesis Testing
73
Hypothesis Testing
bj − βj0 bj
zbj = = for βj0 = 0.
sbj sb j
74
Hypothesis Testing
We assess the size of zbj with the p-value :
-z z
p(Z)
0.2
0.0
-4 -2 0 2 4
Z
75
Hypothesis Testing
76
Hypothesis Testing
77
Example: Hypothesis Testing
78
Hypothesis Testing – Windsor Fund Example
-"./(($01"$!)2*345$5"65"33)42$72$8$,9:;<
Example: Hypothesis Testing
b! sb! b!
sb!
Applied Regression Analysis
!""#$%%%&$'()*"$+,
Carlos M. Carvalho
It turns out that we reject the null at α = .05 (ϕ = .0105). Thus
Windsor does have an “alpha” over the market.
79
Example: Hypothesis Testing
Looking at the slope, this is a very rare case where the null
hypothesis is not zero:
This is the easy bit. The hard (and very important!) part of
forecasting is assessing uncertainty about our predictions.
81
Forecasting
ef = Yf − Ŷf = Yf − b0 − b1 Xf
82
Forecasting
This can get quite complicated! A simple strategy is to build the
following (1 − α)100% prediction interval:
b0 + b1 Xf ± tn−2,α/2 s
84
Glossary and Equations
1 X 2
s2 = ei
n−2
85
Glossary and Equations
bj − βj0
zbj = (= bj /sbj most often)
sbj
86