Introduction to Econometrics
Lecture2
Bivariate regression models
Interpreting least squares regression
results: goodness of fit and significance
tests
Forecasting with a simple regression
model
ECONMET [U13783] Guy Judge February 2010
Recommended reading
DOUGHERTY Introduction to Econometrics
Chapter 2
OR
GUJARATI, D N and PORTER, D C Basic
Econometrics Chapters 2 & 3
ECONMET [U13783] Guy Judge February 2010
Interpreting basic regression output
Parameter estimates (constant intercept and X
coefficient)
Degrees of freedom
The ANOVA table and Sums of Squares
R squared
Standard Error of the Y Estimate (SEE)
Standard error of the X-coefficient
t-values
P values and significance levels
Confidence intervals
ECONMET [U13783] Guy Judge February 2010
Format of the simple linear regression model
We write the simple linear regression
model as
Yi = b0 + b1 Xi + ui for i = 1,2,...,n
where Y is the dependent variable
X is the independent variable
and u is the error or disturbance term
ECONMET [U13783] Guy Judge February 2010
Least squares regression results
Computer regression software will generate sample values for the least
squares estimates ̂ 0 and ˆ1 together with a lot of additional statistical
output.
Note: we use the term ESTIMATOR for the function (e.g. ˆ1
xy
x 2
and ESTIMATE for the actual value that we get when we put sample
data on X and Y into this formula.
ECONMET [U13783] Guy Judge February 2010
Some (fictitious) sales-advertising data
Observation Sales(Y) Advertising(X)
1 36 56.7
2 48 63.9
3 45 62.7
4 40 59.7
5 30 55.9
6 56 68.7
7 63 69.2
8 53 65.5
9 61 69.4
10 68 73.4
11 66 74.1
12 65 74.4
• NOTE: Both variables are measured in thousands of dollars
ECONMET [U13783] Guy Judge February 2010
The sales-advertising model on a
spreadsheet: regression output
Fictitious sales and advertising data - results from Excel
SUMMARY OUTPUT
Re g r e s s i o n S t a t i s t i cs
Mu l t i p l e R 0. 983130884
R Sq u a r e 0. 966546336
Ad j u s t e d R Sq u a r e 0. 963200969
St a n d a r d Er r o r 2. 443603959
Ob s e r v a t i o n s 12
ANOVA
df SS MS F Si gni f i c anc e F
Re g r e s s i o n 1 1725. 204664 1725. 205 288. 9209 1 . 0 4 5 8 3 E- 0 8
Re s i d u a l 10 59. 71200311 5. 9712
To t a l 11 1784. 916667
Co e f f i c i e n t s S t a n d a r d Er r o r t St at P- v a l u e Lo we r 9 5 % Up p e r 9 5 %
I nt e r c e pt - 75. 00001438 7 . 5 3 9 0 0 4 1 2 2 - 9 . 9 4 8 2 7 1 . 6 7 E- 0 6 - 91. 79796528 - 58. 2020635
X Va r i a b l e 1 1. 929183685 0 . 1 1 3 4 9 6 9 2 3 1 6 . 9 9 7 6 7 1 . 0 5 E- 0 8 1. 676296737 2. 182070633
ECONMET [U13783] Guy Judge February 2010
Are the coefficient estimates plausible?
The results here show an estimated
intercept of -75 and a slope (X) coefficient of
just under 2
What do you think about these values?
Are they significantly different from zero?
How good is the fit?
ECONMET [U13783] Guy Judge February 2010
Spreadsheet graph for the sales-ads model
Scatter diagram of sales v ads with
fitted regression line
80
70
sales (Y)
60
50
40
30
20
50 55 60 65 70 75 80
ads (X)
ECONMET [U13783] Guy Judge February 2010
Analysis of Variance (ANOVA) and Sums of Squares
As you can see from the ANOVA table we can decompose the Total Sum
of Squares (of the dependent variable Y around its mean) into two parts:
• the Explained (or Regression) Sum of Squares and
• the Residual Sum of Squares.
Total Sum of Squares =
Explained Sum of Squares + Residual Sum of Squares
(Y Y ) 2
(Yˆ Y ) 2 (Y Yˆ ) 2
Or in terms of deviations from the mean
y 2
yˆ 2 uˆ 2
ECONMET [U13783] Guy Judge February 2010
Goodness of fit: R squared (the Coefficient of determination)
We can now define the Coefficient of Determination or R squared
as the proportion of the Total Variation of the dependent variable (around its mean)
which can be explained by, or attributed to, the regression.
R2
yˆ 2 R2 1
ˆ
u 2
y 2
y 2
Or, as the second equation has it (1 – the proportion of the variation that is not explained
by the regression).
R squared is taken as a measure of the “ goodness of fit” of the regression.
0 R2 1
The closer to 1 is R squared, the better the fit.
ECONMET [U13783] Guy Judge February 2010
The Standard Error of the Y Estimate
ECONMET [U13783] Guy Judge February 2010
The Standard Error of the X coefficient
ECONMET [U13783] Guy Judge February 2010
ECONMET [U13783] Guy Judge February 2010
ECONMET [U13783] Guy Judge February 2010
ECONMET [U13783] Guy Judge February 2010
ECONMET [U13783] Guy Judge February 2010
Forecasting using the simple regression model (1)
Once a model has been estimated (and carefully
validated using economic and statistical tests) it can be
used for prediction or forecasting.
For example our estimated relationship between sales
and ads is (approximately) sales = -75 + 1.929 ads +
residual
We can use this to predict sales for some particular level
of advertising, say ads = 70
The disturbance term is assumed to take its expected
value so we put the residual = 0.
ECONMET [U13783] Guy Judge February 2010
Forecasting using the simple regression model (2)
sales(ads=70) = -75 + 1.9292 * 70 = 60.04
This is just a point forecast. We can create a
forecast confidence interval by taking
95% forecast interval = point forecast sF tn-2, 0.025
Here that would give 60.04 2.58097 * 2.228 =
60.04 5.75 i.e. [54.29, 65.79]
ECONMET [U13783] Guy Judge February 2010
Forecasting using the simple regression model (3)
This interval is quite large because it is based on
a rather small sample. Hence both sF and
tn-2, 0.025 will be fairly large.
Forecasts based on larger samples will be more
precise.
ECONMET [U13783] Guy Judge February 2010
More on the standard error of the forecast
The formula for SF (for the simple bivariate model) is
-1 2 2
SF = u [ 1 + n + (XF - X ) / (Xi - X ) ]
Notice that SF is smaller
the smaller is u
the closer is XF to the sample mean X
the greater is the sample size n
the greater is the in sample variation in X
ECONMET [U13783] Guy Judge February 2010