0% found this document useful (0 votes)
49 views

Linear Regression

Uploaded by

Mhmd Zaraket
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views

Linear Regression

Uploaded by

Mhmd Zaraket
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 216

Chapter 1

Simple Regression
Introduction
 What is regression
 Regression analysis is a statistical process done to study the
relationship between a set of independent variables (explanatory
variables) and the dependent variable (response variable). Through this
technique, it will be possible to understand how the value of the response
variable changes when the explanatory variable is varied.
A regression analysis can have two objectives:
 Explanatory analysis: To understand and weigh the effects of the
independent variable on the dependent variable according to a particular
theoretical model
 Predictive analysis: To locate a linear combination of the independent
variable to predict the value assumed by the dependent variable optimally
Type of data
 Economic data sets come in a variety of types. Whereas some
econometric methods can be applied with little or no
modification to many different kinds of data sets, the special
features of some data sets must be accounted for or should be
exploited. We next describe the most important data structures
encountered in applied work.
 Cross-sectional data

Each observation is a new individual with information at a point in time :


1 observation= information about 1 cross-sectional unit.
Cross-sectional units: individuals, households, firms, cities, states data taken at a
given point in time.
Typical assumption: units form a random sample from the whole population → the
notion of independence of the units’ values.
Type of data
Cross-sectional data

Observation wage educ exper female married

1 3.10 11 2 Yes No
2 3.24 12 22 Yes Yes
… … … … … …
525 11.56 16 5 No No
526 3.5 14 5 Yes No
Type of data
 Time series data
Observations on economic variables over time
 stock prices, money supply, CPI, GDP, annual homicide rates, automobile
sales
frequencies: daily, weekly, monthly, quarterly, annually
Unlike cross-sectional data, ordering is important here!
typically, observations cannot be considered independent across time →
require more complex econometric techniques

Year Un. inflation population


2004 4.95 2.6 260,660
2005 5.21 2.8 263,034
… … … …
Type of data
 Panel (or Longitudinal) data
Panel data or longitudinal data are multi-dimensional data involving
measurements over time (cross section + time series).
Panel data contain observations of multiple phenomena obtained over
multiple time periods for the same individuals

unit year popul murders unemp police


1 2008 293,700 5 6.3 358
1 2010 299,500 7 7.4 396
2 2008 53,450 2 7.2 51
2 2010 51,970 1 8.1 51
…… …… …… …… …… ……
Linear correlation (Pearson correlation)

 Relation between 2 quantitative variables:

Scatter dot:  Description of linear


association:
correlation, simple linear
regression
Y
 Explanation /
prediction:
simple linear regression

X
Linear correlation (Pearson correlation)

 Descriptive statistics of the relation between X & Y:

The covariance
 For sample:
n
cov(x, y)   x i y i  xy
1
n i1
Estimation for population:
n


1
ˆ xy 
cov(x, y)   (x i  x )(y i  y )
฀ n 1 i1
n


1 n
cov(x, y)  x y  xy
n 1 i1 n 1
i i
Linear correlation (Pearson correlation)

Covariance and scatter dot


(x i  x )  0
<0 (y i  y )  0
Contribution > 0
y
฀
฀ < 0
>0

฀ x

฀
Linear correlation (Pearson correlation)

Coefficient of linear correlation (Pearson)

sxy
For sample: rxy 
sx2 s 2y

sxy
Estimation for population: ˆ xy  rxy 

sx2 sy2

฀
Linear correlation (Pearson correlation)

 Coefficient of linear correlation


-1 ≤ r ≤ 1

X2 X2 X2

r = 0.9 r = 0.5 r=0

X2 X2 X2

r = -0.9 r = -0.5 r=0

X1
Linear correlation (Pearson correlation)

 Conditions

 Linearity

 Linear relation

Y Y

Linear Non-linear

X X
Linear correlation (Pearson correlation)

 Conditions
 Normality
 The probability distribution of the couple (X, Y) is a two-
dimensional normal distribution:
In particular, for each value of X, the values of Y are normally
distributed and vice versa.
r=0
r = 0.8
Linear correlation (Pearson correlation)

No respect of conditions


60 1.8

1.7
50
FKLNGTH

1.6

LFKL
40
1.5

30
1.4

20 1.3
0 10 20 30 40 50 0.5 1.0 1.5 2.0
AGE LAGE

Relation age - length : transformation log-log;


Alternative: use a non parametric correlation (spearman)
Linear correlation (Pearson correlation)

 Pearson Correlation test

Test of  = 0

H 0 :   0 No linear relationship (but not absence



H a :   0 of relationship including causality)

r n 2
Under Ho: t obs   t n2,
฀
2
1 r

if H0 is rejected: correlation ≠ causality


Simple linear regression

 Regression curve E(X/Y)

Description of the relation: E(Y/X)


conditional probability of Y on X:
Y
f x x0 (y)dy  P(y  Y  y  dy / X  x 0 )

Regression curve= E(Y/X) et E(X/Y) X


Simple linear regression
1. Model

We assume: y = f(x) = 0 + 1 x1

Model: Yi = 0 + 1 x1 + ℰi avec, pour X = xi, Yi : 0 + 1 xi , σ)

X = independent variable
controlled
Y
Y = dependent variable, random

 Error: ℰi ~ N(0, σ) X
Simple linear regression
 Estimate E[y|x] (called the population regression function).
 In other words, we need to find a “good” mathematical expression
for f in E[y|x] = f(x): The simplest model is E[y|x] = β0 + β1 x
Simple linear regression
2. Parameter estimation

•Intercept: β0 = E[y|x = 0]
• Slope:
Simple linear regression
2. Parameter estimation

 Estimation method of parameters 0 and  1: Least Squares


Method

Mi
yi
ℰi Yi = 0 +  1Xi + ℰi
yˆ i
M’i
ℰi = yi - (0 +  1Xi)
Y

Minimum of
฀ squared error:
X xi min  ( yi  yˆ i )2
Simple linear regression
2. Parameter estimation
 Least square method
Simple linear regression
2. Parameter estimation
 Least square method
Simple linear regression
2. Properties of Least -Square Estimators:

 It is linear, that is, a linear function of a random variable, such as


the dependent variable Y in the regression model.
 It is unbiased, that is, its average or expected value, E( ), is equal
to the true value, β1 : E( ) = β1
 It has minimum variance in the class of all such linear unbiased
estimators; an unbiased estimator with the least variance is known as
an efficient estimator.
Simple linear regression

3. Checking model quality in linear regression

We assume that: Yi = 0 + 1 x1 + ℰi

- Normal distribution of errors


- Equality of variancea (homoscedasticity)

- independence (cov(ℰi , ℰj) = 0


- linear relation
Simple linear regression

3. Checking model quality in linear regression

 Normality of error
Residuals

Predicted values
Simple linear regression

3. Checking model quality in linear regression


Homoscedasticity
Residuals

Predicted value

Possibility of transformation
Simple linear regression

3. Checking model quality in linear regression


Independence between errors, linearity
Residual

Error structure?
Residual

Non linear relation?


Simple linear regression

4. Coefficient of determination

 Decomposition of variation

How much of the variability of Y is explained by the linear


relationship with X?

Variability? Sum of Squares of errors SCE:

n
SCET   (y i  y )  ns2 2
y
i1
Simple linear regression

4. Coefficient of determination
 Decomposition of variation

Y
= +

SCT SCR reg.lin. SCE hors reg.lin. (error)

= +
Simple linear regression
4. Coefficient of determination
 Decomposition of variation
Simple linear regression

4. Coefficient of determination:
 Relation between r et r2

n n
SCE reg.lin.  ( yˆ i  y ) 2  ((a  bx i )  (a  bx )) 2
i1 i1
n
 b 2  (x i  x ) 2  b2 nsx2  b 2 SCE x
i1

฀ 2 2 2 2
b ns cov(x, y) s (cov(x, y))
so r 
2
2
x
( 2
) 2
2 x
2 2
 (r) 2

nsy sx sy sx sy
฀
If r = 0 <=> r2 = 0
Simple linear regression
5. Analysis of variance table (ANOVA table)

 Decomposition of variation test or analysis of variance


(ANOVA): H0 : r2 = 0 (overall significance of the model)
Simple linear regression
 Student test: Significance test of parameters

Principle of the test parameters: bˆ


: Tn2
sˆb

bˆ bˆ
Ho: b = 0
 : Tn2
sˆb (1 r 2 )sy2 ฀
(n  2)s2
x
Simple linear regression

Random variables: X controlled, Y random


X and Y

Question
Explanation of Y by X:
Is there a relation? Correlation Simple linear model
What relation? Regression

Model
Y = β0 + β0x + ℰi
(X,Y) binormal =>
linearity of regressions For X = xi,
Yi ~N(β0 + β0x i, s)
Chapter 2

Multiple Regression
Multiple linear regression

Multiple Regression Model


Least Squares Method
Multiple Coefficient of Determination
Model Assumptions
Testing for Significance
Using the Estimated Regression Equation for Estimation
and Prediction
Qualitative Independent Variables
Residual Analysis
Logistic Regression
Multiple Regression Model
 Multiple Regression Model
The equation that describes how the dependent variable y is related to
the independent variables x1, x2, . . . xp and an error term is:

y = 0 + 1 x1 + ℰ

where:
0, 1, 2, . . . , p are the parameters, and ℰ is a
random variable called the error term
Multiple linear regression

 Multiple Regression Equation


The equation that describes how the mean value of
y is related to x1, x2, . . . xp is:

E(y) = 0 + 1x1 + 2x2 + . . . + pxp


Estimated Multiple Regression Equation

 Estimated Multiple Regression Equation

Y-hat= b0 + b1x1 + b2x2 +. . .+ bpxp

A simple random sample is used to compute sample


statistics b0, b1, b2, . . . , bp that are used as the point
estimators of the parameters 0, 1, 2, . . . , p
Estimation Process

Sample Data:
Multiple Regression Model x1 x2 . . . x p y
E(y) = 0 + 1x1 + 2x2 +. . .+ pxp + e . . . .
Multiple Regression Equation . . . .
E(y) = 0 + 1x1 + 2x2 +. . .+ pxp
Unknown parameters are
0, 1, 2, . . . , p

Estimated Multiple
b0, b1, b2, . . . , bp Regression Equation
provide estimates of Y-hat = b0 + b1x1 + b2x2 +. . .+ bpxp
0, 1, 2, . . . , p Sample statistics are
b0, b1, b2, . . . , bp
Least Squares Method

 Least Squares Criterion

min  ( yi  yˆ i )2

Computation of Coefficient Values


The formulas for the regression coefficients
b0, b1, b2, . . . bp involve the use of matrix algebra.
We will rely on computer software to perform the
calculations.
Multiple Regression Model
 Example: Programmer Salary Survey
The years of experience, score on the aptitude test, and corresponding annual
salary ($1000s) for a sample of 20 programmers is shown on the next slide.

Exper. Score Salary Exper. Score Salary


4 78 24.0 9 88 38.0
7 100 43.0 2 73 26.6
1 86 23.7 10 75 36.2
5 82 34.3 5 81 31.6
8 86 35.8 6 74 29.0
10 84 38.0 8 87 34.0
0 75 22.2 4 79 30.1
1 80 23.1 6 94 33.9
6 83 30.0 3 70 28.2
6 91 33.0 3 89 30.0
Multiple Regression Model

Suppose we believe that salary (y) is related to the years


of experience (x1) and the score on the programmer
aptitude test (x2) by the following regression model:

y = 0 + 1x1 + 2x2 + e

where
y = annual salary ($1000)
x1 = years of experience
x2 = score on programmer aptitude test
Solving for the Estimates of 0, 1, 2

Least Squares
Input Data Output
x1 x2 y SPSS b0 =
for Solving b1 =
4 78 24
Multiple b2 =
7 100 43
. . . Regression
R2 =
. . . Problems
3 89 30 etc.
Solving for the Estimates of 0, 1, 2

 Regression Equation Output


A B C D E
38
39 Coeffic. Std. Err. t Stat P-value
40 Intercept 3.17394 6.15607 0.5156 0.61279
41 Experience 1.4039 0.19857 7.0702 1.9E-06
42 Test Score 0.25089 0.07735 3.2433 0.00478
43

SALARY = 3.174 + 1.404(EXPER) + 0.251(SCORE)

Note: Predicted salary will be in thousands of dollars.


Interpreting the Coefficients

 In multiple regression analysis, we interpret each


regression coefficient as follows:

bi represents an estimate of the change in y


corresponding to a 1-unit increase in xi when all
other independent variables are held constant.

b1 = 1.404

Salary is expected to increase by $1,404 for each additional


year of experience (when the variable score on programmer
attitude test is held constant).
Interpreting the Coefficients

b1 = 1.404

Salary is expected to increase by $1,404 for


each additional year of experience (when the variable
score on programmer attitude test is held constant).

b2 = 0.251

Salary is expected to increase by $251 for each


additional point scored on the programmer aptitude
test (when the variable years of experience is held
constant).
Interpreting the Coefficients

b2 = 0.251

Salary is expected to increase by $251 for each


additional point scored on the programmer aptitude
test (when the variable years of experience is held
constant).
Multiple Coefficient of Determination

 Relationship Among SST, SSR, SSE

SST = SSR + SSE

 i
( y  y )2
=  i
( ˆ
y  y )2
+  i i
( y  ˆ
y )2

where:
SST = total sum of squares
SSR = sum of squares due to regression
SSE = sum of squares due to error
Multiple Coefficient of Determination

 ANOVA Output
A B C D E F
32
33 ANOVA
34 df SS MS F Significance F
35 Regression 2 500.3285 250.1643 42.76013 2.32774E-07
36 Residual 17 99.45697 5.85041
37 Total 19 599.7855
38
SSR
SST
Multiple Coefficient of Determination

R2 = SSR/SST = 1- SSE/SST

R2 = 500.3285/599.7855 = .83418

 Adjusted Multiple Coefficient of Determination

n1
Ra2  1  (1  R )
2
np1

20  1
R  1  (1  .834179)
2
 .814671
20  2  1
a
Assumptions About the Error Term e

The error e is a random variable with mean of zero.

The variance of e , denoted by 2, is the same for all


values of the independent variables.

The values of e are independent.

The error e is a normally distributed random variable


reflecting the deviation between the y value and the
expected value of y given by 0 + 1x1 + 2x2 + . . + pxp.
Testing for Significance

In simple linear regression, the F and t tests provide


the same conclusion.

The F test is referred to as the test for overall


significance.

The F test is used to determine whether a significant


relationship exists between the dependent variable
and the set of all the independent variables.

In multiple regression, the F and t tests have different


purposes.
Testing for Significance: t Test

If the F test shows an overall significance, the t test is


used to determine whether each of the individual
independent variables is significant.

A separate t test is conducted for each of the


independent variables in the model.

We refer to each of these t tests as a test for individual


significance.
Testing for Significance: F Test

Hypotheses H 0 : 1 = 2 = . . . = p = 0
Ha: One or more of the parameters
is not equal to zero.

Test Statistics F = MSR/MSE

Rejection Rule Reject H0 if p-value <  or if F > F


where F is based on an F distribution
with p d.f. in the numerator and
n - p - 1 d.f. in the denominator.
Testing for Significance: t Test

Hypotheses H0 : i  0
H a : i  0

Test Statistics t = bi /si

Rejection Rule Reject H0 if p-value <  or


if t < -tor t > twhere t
is based on a t distribution
with n - p - 1 degrees of freedom.
Testing for Significance: Multicollinearity

The term multicollinearity refers to the correlation


among the independent variables.

When the independent variables are highly correlated


(say, |r | > 0.7), it is not possible to determine the
separate effect of any particular independent variable
on the dependent variable.
Testing for Significance: Multicollinearity

If the estimated regression equation is to be used only


for predictive purposes, multicollinearity is usually
not a serious problem.

Every attempt should be made to avoid including


independent variables that are highly correlated.
Qualitative Independent Variables

In many situations we must work with qualitative


independent variables such as gender (male, female),
method of payment (cash, check, credit card), etc.

For example, x2 might represent gender where x2 = 0


indicates male and x2 = 1 indicates female.

In this case, x2 is called a dummy or indicator variable.


Qualitative Independent Variables

 Example: Programmer Salary Survey

As an extension of the problem involving the computer


programmer salary survey, suppose that management also
believes that the annual salary is related to whether the
individual has a graduate degree in computer science or
information systems.

The years of experience, the score on the programmer aptitude test,


whether the individual has a relevant graduate degree, and the
annual salary ($1000) for each of the sampled 20 programmers are
shown on the next slide.
Qualitative Independent Variables

Exper. Score Degr. Salary Exper. Score Degr. Salary


4 78 No 24.0 9 88 Yes 38.0
7 100 Yes 43.0 2 73 No 26.6
1 86 No 23.7 10 75 Yes 36.2
5 82 Yes 34.3 5 81 No 31.6
8 86 Yes 35.8 6 74 No 29.0
10 84 Yes 38.0 8 87 Yes 34.0
0 75 No 22.2 4 79 No 30.1
1 80 No 23.1 6 94 Yes 33.9
6 83 No 30.0 3 70 No 28.2
6 91 Yes 33.0 3 89 No 30.0
Estimated Regression Equation

y = b0 + b1x1 + b2x2 + b3x3

where:
y^ = annual salary ($1000)
x1 = years of experience
x2 = score on programmer aptitude test
x3 = 0 if individual does not have a graduate degree
1 if individual does have a graduate degree

x3 is a dummy variable
Qualitative Independent Variables

Regression Equation Output


A B C D E
38
39 Coeffic. Std. Err. t Stat P-value
40 Intercept 7.94485 7.3808 1.0764 0.2977
41 Experience 1.14758 0.2976 3.8561 0.0014
42 Test Score 0.19694 0.0899 2.1905 0.04364
43 Grad. Degr. 2.28042 1.98661 1.1479 0.26789
44
Note: Columns F-I are not shown.

Not significant
More Complex Qualitative Variables

If a qualitative variable has k levels, k - 1 dummy


variables are required, with each dummy variable
being coded as 0 or 1.

For example, a variable with levels A, B, and C could


be represented by x1 and x2 values of (0, 0) for A, (1, 0)
for B, and (0,1) for C.

Care must be taken in defining and interpreting the


dummy variables.
More Complex Qualitative Variables

For example, a variable indicating level of education could be


represented by x1 and x2 values as follows:

Highest
Degree x1 x2
Bachelor’s 0 0
Master’s 1 0
Ph.D. 0 1
Residual Analysis

For simple linear regression the residual plot against


y-hat
ŷ and the residual plot against x provide the same
information.
In multiple regression analysis it is preferable to use the
residual plot against to determine
ŷ if the model
assumptions are satisfied.
Standardized Residual Plot Against ŷ

Standardized residuals are frequently used in residual plots


for purposes of:
• Identifying outliers (typically, standardized residuals
< -2 or > +2)
• Providing insight about the assumption that the error
term e has a normal distribution
The computation of the standardized residuals in multiple
regression analysis is too complex to be done by hand
Standardized Residual Plot Against ŷ

Standardized Residual Plot


Outlier
Standardized Residual Plot
3

2
Residuals
Standard

0
0 10 20 30 40 50
-1

-2
Predicted Salary
Chapter 3

Diagnostics Regression
L8: Heteroscedasticity

Feng Li
feng.li@cufe.edu.cn

School of Statistics and Mathematics


Central University of Finance and Economics
What is so-called heteroscedasticity
In a linear regression model, we assume the error term has a normal
distribution with mean zero and variance of σ2 , i.e.
Var(ui ) = σ2
which is called homoscedasticity.
But when the error term does not have constant variance, i.e.
Var(ui ) = σ2i
we call it heteroscedasticity.
See the differences between the two pictures for the model
Saving = α + βIncome + ui

Feng Li (Statistics, CUFE) Econometrics 2 / 22


An OLS example

Recall the model Yi = α1 + α2 Xi + ui .


If the error term ui is homoscedastic with variance σ2 , we know we have
BLUE estimators and
ř
xi yi σ2
α̂2 = ř 2 , Var(α̂2 ) = ř 2 .
xi xi

If the error term ui is homoscedastic with variance σ2i , we have


ř ř 2 2
xi yi x σ σ2
α̂2 = ř 2 , Var(α̂2 ) = ř i2 i2 ‰ ř 2 .
xi ( xi ) xi

why? see Appendix 11A.1.


§ α̂2 is still linear and unbiased, why?
§ But it is not “best” anymore, i.e. will not grant the minimum variance.

Feng Li (Statistics, CUFE) Econometrics 3 / 22


Use GLS to take heteroscedasticity into account I
The OLS method treats every observation equally and does not take
heteroscedasticity into account.
The generalized least squares (GLS) will.
§ Consider the heteroscedasticity model

Yi = β1 + β2 X1i + ui , where Var(ui ) = σ2i

If we transform the model by dividing 1/ wi where wi = 1/σ2i at both sides


§
?
(assume σi is known),

Yi 1 Xi ui
= β1 + β2 +
σi σi σi σi
which can be rewritten as

Yi˚ = β1 X˚ ˚ ˚
0i + β2 X1i + ui ,

and u˚
i = ui /σi is the new error term for the new model.
§ Var(ui /σi ) = 1 is now a constant. why?
§ We call β̂˚ ˚
1 β̂2 as GLS estimators

Ŷi˚ = β̂˚ ˚ ˚ ˚
1 X0i + β̂2 X1i ,

Feng Li (Statistics, CUFE) Econometrics 4 / 22


Use GLS to take heteroscedasticity into account II
To obtain GLS estimators, we minimize
ÿ ÿ ÿ
(û˚i )2 = (Yi˚ ´ β̂˚1 X˚0i ´ β̂˚2 X˚1i )2 = wi (Yi ´ β̂˚1 X0i ´ β̂˚2 X1i )2

which is done by the usual way we have done in OLS.


The GLS estimator of β˚2 is
ř ř ř ř
˚ wi wi Xi Yi ´ wi Xi wi Yi
β̂2 = ř ř ř
wi wi X2i ´ ( wi Xi )2

and the variance is


ř
  wi
Var β̂˚2 =ř ř ř
wi wi X2i ´( wi Xi )2

where wi = 1/σ2i .
When wi = w = 1/σ2 , the GLS estimator will reduce to the OLS estimator.
Verify this!
ř ř ř ř
β̂˚1 = Ȳ ˚ ´ β̂˚2 X̄˚ where Ȳ ˚ = ( wi Yi )/ (wi ), X̄˚ = ( wi Xi )/ (wi ).

Feng Li (Statistics, CUFE) Econometrics 5 / 22


Use GLS to take heteroscedasticity into account III

At this particular setting wi = 1/σ2i , we call this is weighted least squares


(WLS) which is a special case of GLS.
β̂˚2 is unbiased and Var(β̂˚2 ) ă Var(β̂2 ).
It can be shown that ř
wi x˚i y˚i
β̂˚2 = ř 2
wi x˚i
and   1
Var β̂˚2 = ř 2
wi x˚i
where
x˚i = Xi ´ X̄˚ , y˚i = Yi ´ Ȳ ˚
See Exercise 11.5.

Feng Li (Statistics, CUFE) Econometrics 6 / 22


GLS in matrix form I
Let Y given X is a linear function of X, whereas the conditional variance of
the error term given X is a known matrix Ω

Y = Xβ + ε, E[ε|X] = 0, Var[ε|X] = Ω.

Generalized least squares method estimates β by minimizing the squared


Mahalanobis length of this residual vector:

β̂ = arg min (Y ´ Xβ) 1 Ω´1 (Y ´ Xβ),


β

The estimator has an explicit formula:

β̂ = (X 1 Ω´1 X)´1 X 1 Ω´1 Y.


The GLS estimator is unbiased, consistent, efficient, and asymptotically
normal: ? d 
n(β̂ ´ β) − Ñ N 0, (X 1 Ω´1 X)´1 .

Feng Li (Statistics, CUFE) Econometrics 7 / 22


GLS in matrix form II

If the covariance of the errors Ω is unknown, one can get a consistent


estimate of Ω , say Ω,p we proceed in two stages:
The model is estimated by OLS or another consistent (but inefficient)
estimator, and the residuals are used to build a consistent estimator of the
errors covariance matrix;
p OLS = (X 1 X)´1 X 1 y
β
p j = (Y ´ Xb)j
u
p OLS = diag(p
Ω σ21 , σ
p22 , . . . , σ
p2n ).

Using the consistent estimator of the covariance matrix of the errors, we


implement GLS ideas.

p GLS = (X 1 Ω
β p ´1 y
p ´1 X)´1 X 1 Ω
OLS OLS

Feng Li (Statistics, CUFE) Econometrics 8 / 22


GLS in matrix form III

p can be iterated to
The procedure can be iterated and this estimation of Ω
convergence.

p GLS = Y ´ Xβ
u p GLS
p GLS = diag(p
Ω σ2GLS,1 , σ
p2GLS,2 , . . . , σ
p2GLS,n )
p GLS = (X 1 Ω
β p ´1 X)´1 X 1 Ω p ´1 y
GLS GLS

Question: How do you calculate the degrees of freedom of the model? [Hint:
think about the hat matrix]

Feng Li (Statistics, CUFE) Econometrics 9 / 22


WLS example – Example 11.7 I

Assume we want to make WLS regression with the give data.

What can you do then?


§ Option 1: Apply the general GLS formula in p.5 to obtain the estimators.
§ Option 2: Use OLS to regress Y/σi with 1/σi and Xi /σi without intercept.
Can you obtain the same results?
Compare the WLS results with OLS results.

Feng Li (Statistics, CUFE) Econometrics 10 / 22


WLS example – Example 11.7 II

How do the standard errors and t statistics change?

Feng Li (Statistics, CUFE) Econometrics 11 / 22


Consequences of using OLS when heteroscedastic

Suppose there are heteroscedastic but we insist using OLS. What will go
wrong? – whatever conclusions we draw may be misleading.
We could not establish confidence intervals and test hypotheses with usual t,
F tests.
The usual tests are likely to give larger variance than the true variance.
The variance estimator of β̂ by OLS is a biased estimator of the true
variance.
ř
The usual estimator of σ2 which was û2i /(n ´ 2) is biased.

Feng Li (Statistics, CUFE) Econometrics 12 / 22


Detecting heteroscedasticity (1)
ï Plot û2i against Ŷi

Feng Li (Statistics, CUFE) Econometrics 13 / 22


Detecting heteroscedasticity (2)
ï Plot û2i against Xi

Feng Li (Statistics, CUFE) Econometrics 14 / 22


Detecting heteroscedasticity (3)
ï QQ plot
If the residual is normally distributed, plot sample quantile for the residual
against the theoretical quantile from standard normal distribution should on
the 45 degree line.
Normal Q−Q Plot

2

● ●

●● ●
●●
●●
●●
1

●●●●

●●●●

●●●
●●●
●●●
●●●

Sample Quantiles

●●




●●
●●


0

●●
●●
●●
●●
●●
●●
●●

●●●



●●
●●●●
●●

●●●


●●

●●●
−1



●●

● ●●
−2

● ● ●

−2 −1 0 1 2

Theoretical Quantiles

Feng Li (Statistics, CUFE) Econometrics 15 / 22


Detecting heteroscedasticity (4)
ï White’s general heteroscedasticity test

H0 : No heteroscedasticity.
Consider Yi = β1 + β2 X2i + β3 X3i + ui , (other models are the same)
step 1: Do the OLS to obtain the residuals ûi .
step 2: Run the following model with the covariates and their crossproducts

û2i = α1 + α2 X2i + α3 X3i + α4 X22i + α5 X23i + α6 X2i X3i + vi

and obtain R2 .
step 3: nR2 „ χ2 (k ´ 1) where k is no. of unknown parameters in step 2.
step 4: If χ2obs (k ´ 1) ą χ2crit (k ´ 1), reject H0 .
Question: How do you carry out White’s test with the model
Yi = β1 + β2 X2i + β3 X3i + β4 X4i + ui ?

Feng Li (Statistics, CUFE) Econometrics 16 / 22


Example of White’s test

Consider the following regression model with 41 observations,

lnYi = β1 + β2 lnX2i + β3 lnX3i + ui

where Y = ratio of trade taxes (import and export taxes) to total government
revenue, X2 = ratio of the sum of exports plus imports to GNP, and X3 =
GNP per capita.
By applying White’s heteroscedasticity test. We first obtain the residuals
from regression.
Then we do the following auxiliary regression

û2i = ´5.8+2.5 ln X2i +0.69 ln X3i ´0.4(ln X2i )2 ´0.04(ln X3i )2 +0.002 ln X2i ln X3i

and R2 = 0.1148.
Can you compute the white’s test statistic?
What is your conclusion of heteroscedasticity? (The 5% critical value of
χ2df=5 = 11.07, and the 10% critical value of χ2df=5 = 9.24)

Feng Li (Statistics, CUFE) Econometrics 17 / 22


Detecting heteroscedasticity (5)
ï Goldfeld-Quandt test
It is popular to assume σ2i is positively related to one of the covariates e.g.
σ2i = σ2 X22i in a three covariates model.
The bigger Xi we have, the bigger σ2i is.
H0 : Homoscedasticity
step 1: Sort covariates with the order of X2i
step 2: Delete the c central observations and dived the remaining parts into
two groups.
step 3: Fit the two groups separately with OLS and obtain SSE1
RSS1 (for the
small values group) and RSS
SSE22 (for the large values group) with both
(n ´ c)/2 ´ k degrees of freedom. Why?
step 4: Compute the ratio

RSS2 /[(n
SSE2 ´ c)/2 ´ k]
λ= „ F(((n ´ c)/2 ´ k), ((n ´ c)/2 ´ k))
RSS1 /[(n
SSE1 ´ c)/2 ´ k]

Reject H0 if λ ą Fcrit (((n ´ c)/2 ´ k), ((n ´ c)/2 ´ k)).

Feng Li (Statistics, CUFE) Econometrics 18 / 22


Testing for heteroskedasticity

In R: gqtest().

Default: Assume that the data are already ordered, split sample in the
middle without omitting any central observations.

Illustration: Order the observations with respect to price per citation.


R> gqtest(jour_lm, order.by = ~ citeprice, data = journals)
Goldfeld-Quandt test

data: jour_lm
GQ = 1.7, df1 = 88, df2 = 88, p-value = 0.007
alternative hypothesis: variance increases from segment 1 to 2

Christian Kleiber, Achim Zeileis © 2008–2017Applied Econometrics with R – 4 – Diagnostics and Alternative Methods of Regression – 32 / 86
Detecting heteroscedasticity (6)
ï Breusch-Pagan-Godfrey test

Consider the model Yi = β1 + β2 X2i + ... + +βk Xki + ui


Assume that σ2i = α1 + α2 Z2i + ... + αm Zmi where Zi are known variables
which can be Xi .
If there no heteroscedasticity, then α2 = ... = αm = 0 and σ2i = α1 .
step 1: Obtain û1 , ..., ûn by the model.
step 2: Obtain σ̃2 = û2i /n.
ř

step 3: Construct variable pi = û2i /σ̃2


step 4: Regress pi = α1 + α2 Z2i + ... + αm Zmi + vi
step 5: Obtain
ESS/2
SSR „ χ2 (m ´ 1)
ESS/2 ą χ2crit (m ´ 1).
Evidence of heteroscedasticity when SSR

Feng Li (Statistics, CUFE) Econometrics 19 / 22


Testing for heteroskedasticity

White test uses original regressors as well as their squares and


interactions in auxiliary regression.

Can use bptest():


R> bptest(jour_lm, ~ log(citeprice) + I(log(citeprice)^2),
+ data = journals)
studentized Breusch-Pagan test

data: jour_lm
BP = 11, df = 2, p-value = 0.004

Christian Kleiber, Achim Zeileis © 2008–2017Applied Econometrics with R – 4 – Diagnostics and Alternative Methods of Regression – 30 / 86
Detecting heteroscedasticity (7)
ï Spearman’s rank correlation test
1 The null hypothesis: Heteroscedasticity
2 Obtain the residuals ûi from the regression.
3 Rank |ûi | and Xi (or Yi )
4 Compute the Spearman’s rank correlation coefficients
ř 2
di
rs = 1 ´ 6
n(n2 ´ 1)
where di are the differences of |ui | and Xi in the ranked order and n is
number of individuals.
5 The significance of the sample rs can be tested by the t test as
?
rs n ´ 2
tobs =
1 ´ r2s
6 Decision rule: if tobs ą tcritical , accept H0 . Otherwise, there is no
heteroscedasticity. Multiple regressors should repeat this procedure multiple
times.
Feng Li (Statistics, CUFE) Econometrics 20 / 22
How to obtain estimators
ï with Yi = β1 + β2 Xi + ui when E(ui ) = 0 and Var(ui ) = σ2i
When σi is known: use WLS method to obtain BLUE estimators. pp. 4–5
When σi is not known:
§ If V(ui ) = σ2 X2i , do OLS with model
Yi 1 ui
= β2 + β1 +
Xi Xi Xi
and Var(ui /Xi ) = σ2 . Why?
§ If Var(ui ) = σ2 Xi (Xi ą 0), do OLS with model
Y 1 u
? i = β2 + β1 ? + ? i
Xi Xi Xi
?
and Var(ui / Xi ) = σ2 . Why?
§ If Var(ui ) = σ2 [E(Yi )]2 (Xi ą 0), do OLS with model
Yi 1 ui
= β2 + β1 +
Yˆi Yˆi Yˆi
and Var(ui /Yˆi ) « Var(ui )/[EYˆi ]2 = Var(ui )/Yi 2 = σ2 .
§ Do OLS with log transformed data lnYi = β1 + β2 lnXi + vi can also reduce
heteroscedasticity.
Feng Li (Statistics, CUFE) Econometrics 21 / 22
L8: Heteroscedasticity

Feng Li
feng.li@cufe.edu.cn

School of Statistics and Mathematics


Central University of Finance and Economics
What is so-called heteroscedasticity
In a linear regression model, we assume the error term has a normal
distribution with mean zero and variance of σ2 , i.e.
Var(ui ) = σ2
which is called homoscedasticity.
But when the error term does not have constant variance, i.e.
Var(ui ) = σ2i
we call it heteroscedasticity.
See the differences between the two pictures for the model
Saving = α + βIncome + ui

Feng Li (Statistics, CUFE) Econometrics 2 / 22


An OLS example

Recall the model Yi = α1 + α2 Xi + ui .


If the error term ui is homoscedastic with variance σ2 , we know we have
BLUE estimators and
ř
xi yi σ2
α̂2 = ř 2 , Var(α̂2 ) = ř 2 .
xi xi

If the error term ui is homoscedastic with variance σ2i , we have


ř ř 2 2
xi yi x σ σ2
α̂2 = ř 2 , Var(α̂2 ) = ř i2 i2 ‰ ř 2 .
xi ( xi ) xi

why? see Appendix 11A.1.


§ α̂2 is still linear and unbiased, why?
§ But it is not “best” anymore, i.e. will not grant the minimum variance.

Feng Li (Statistics, CUFE) Econometrics 3 / 22


Use GLS to take heteroscedasticity into account I
The OLS method treats every observation equally and does not take
heteroscedasticity into account.
The generalized least squares (GLS) will.
§ Consider the heteroscedasticity model

Yi = β1 + β2 X1i + ui , where Var(ui ) = σ2i


?
§ If we transform the model by dividing 1/ wi where wi = 1/σ2i at both sides
(assume σi is known),

Yi 1 Xi ui
= β1 + β2 +
σi σi σi σi
which can be rewritten as

Yi˚ = β1 X˚ ˚ ˚
0i + β2 X1i + ui ,

and u˚
i = ui /σi is the new error term for the new model.
§ Var(ui /σi ) = 1 is now a constant. why?
§ We call β̂˚ ˚
1 β̂2 as GLS estimators

Ŷi˚ = β̂˚ ˚ ˚ ˚
1 X0i + β̂2 X1i ,

Feng Li (Statistics, CUFE) Econometrics 4 / 22


Use GLS to take heteroscedasticity into account II
To obtain GLS estimators, we minimize
ÿ ÿ ÿ
(û˚i )2 = (Yi˚ ´ β̂˚1 X˚0i ´ β̂˚2 X˚1i )2 = wi (Yi ´ β̂˚1 X0i ´ β̂˚2 X1i )2

which is done by the usual way we have done in OLS.


The GLS estimator of β˚2 is
ř ř ř ř
˚ wi wi Xi Yi ´ wi Xi wi Yi
β̂2 = ř ř ř
wi wi X2i ´ ( wi Xi )2

and the variance is


ř
  wi
Var β̂˚2 =ř ř ř
wi wi X2i ´( wi Xi )2

where wi = 1/σ2i .
When wi = w = 1/σ2 , the GLS estimator will reduce to the OLS estimator.
Verify this!
ř ř ř ř
β̂˚1 = Ȳ ˚ ´ β̂˚2 X̄˚ where Ȳ ˚ = ( wi Yi )/ (wi ), X̄˚ = ( wi Xi )/ (wi ).

Feng Li (Statistics, CUFE) Econometrics 5 / 22


Use GLS to take heteroscedasticity into account III

At this particular setting wi = 1/σ2i , we call this is weighted least squares


(WLS) which is a special case of GLS.
β̂˚2 is unbiased and Var(β̂˚2 ) ă Var(β̂2 ).
It can be shown that ř
wi x˚i y˚i
β̂˚2 = ř 2
wi x˚i
and   1
Var β̂˚2 = ř 2
wi x˚i
where
x˚i = Xi ´ X̄˚ , y˚i = Yi ´ Ȳ ˚
See Exercise 11.5.

Feng Li (Statistics, CUFE) Econometrics 6 / 22


GLS in matrix form I
Let Y given X is a linear function of X, whereas the conditional variance of
the error term given X is a known matrix Ω

Y = Xβ + ε, E[ε|X] = 0, Var[ε|X] = Ω.

Generalized least squares method estimates β by minimizing the squared


Mahalanobis length of this residual vector:

β̂ = arg min (Y ´ Xβ) 1 Ω´1 (Y ´ Xβ),


β

The estimator has an explicit formula:

β̂ = (X 1 Ω´1 X)´1 X 1 Ω´1 Y.


The GLS estimator is unbiased, consistent, efficient, and asymptotically
normal: ? d 
n(β̂ ´ β) − Ñ N 0, (X 1 Ω´1 X)´1 .

Feng Li (Statistics, CUFE) Econometrics 7 / 22


GLS in matrix form II

If the covariance of the errors Ω is unknown, one can get a consistent


estimate of Ω , say Ω,p we proceed in two stages:
The model is estimated by OLS or another consistent (but inefficient)
estimator, and the residuals are used to build a consistent estimator of the
errors covariance matrix;
p OLS = (X 1 X)´1 X 1 y
β
p j = (Y ´ Xb)j
u
p OLS = diag(p
Ω σ21 , σ
p22 , . . . , σ
p2n ).

Using the consistent estimator of the covariance matrix of the errors, we


implement GLS ideas.

p GLS = (X 1 Ω
β p ´1 y
p ´1 X)´1 X 1 Ω
OLS OLS

Feng Li (Statistics, CUFE) Econometrics 8 / 22


GLS in matrix form III

p can be iterated to
The procedure can be iterated and this estimation of Ω
convergence.

p GLS = Y ´ Xβ
u p GLS
p GLS = diag(p
Ω σ2GLS,1 , σ
p2GLS,2 , . . . , σ
p2GLS,n )
p GLS = (X 1 Ω
β p ´1 X)´1 X 1 Ω p ´1 y
GLS GLS

Question: How do you calculate the degrees of freedom of the model? [Hint:
think about the hat matrix]

Feng Li (Statistics, CUFE) Econometrics 9 / 22


WLS example – Example 11.7 I

Assume we want to make WLS regression with the give data.

What can you do then?


§ Option 1: Apply the general GLS formula in p.5 to obtain the estimators.
§ Option 2: Use OLS to regress Y/σi with 1/σi and Xi /σi without intercept.
Can you obtain the same results?
Compare the WLS results with OLS results.

Feng Li (Statistics, CUFE) Econometrics 10 / 22


WLS example – Example 11.7 II

How do the standard errors and t statistics change?

Feng Li (Statistics, CUFE) Econometrics 11 / 22


Consequences of using OLS when heteroscedastic

Suppose there are heteroscedastic but we insist using OLS. What will go
wrong? – whatever conclusions we draw may be misleading.
We could not establish confidence intervals and test hypotheses with usual t,
F tests.
The usual tests are likely to give larger variance than the true variance.
The variance estimator of β̂ by OLS is a biased estimator of the true
variance.
ř
The usual estimator of σ2 which was û2i /(n ´ 2) is biased.

Feng Li (Statistics, CUFE) Econometrics 12 / 22


Detecting heteroscedasticity (1)
ï Plot û2i against Ŷi

Feng Li (Statistics, CUFE) Econometrics 13 / 22


Detecting heteroscedasticity (2)
ï Plot û2i against Xi

Feng Li (Statistics, CUFE) Econometrics 14 / 22


Detecting heteroscedasticity (3)
ï QQ plot
If the residual is normally distributed, plot sample quantile for the residual
against the theoretical quantile from standard normal distribution should on
the 45 degree line.
Normal Q−Q Plot

2

● ●

●● ●
●●
●●
●●
1

●●●●

●●●●

●●●
●●●
●●●
●●●

Sample Quantiles

●●




●●
●●


0

●●
●●
●●
●●
●●
●●
●●

●●●



●●
●●●●
●●

●●●


●●

●●●
−1



●●

● ●●
−2

● ● ●

−2 −1 0 1 2

Theoretical Quantiles

Feng Li (Statistics, CUFE) Econometrics 15 / 22


Detecting heteroscedasticity (4)
ï White’s general heteroscedasticity test

H0 : No heteroscedasticity.
Consider Yi = β1 + β2 X2i + β3 X3i + ui , (other models are the same)
step 1: Do the OLS to obtain the residuals ûi .
step 2: Run the following model with the covariates and their crossproducts

û2i = α1 + α2 X2i + α3 X3i + α4 X22i + α5 X23i + α6 X2i X3i + vi

and obtain R2 .
step 3: nR2 „ χ2 (k ´ 1) where k is no. of unknown parameters in step 2.
step 4: If χ2obs (k ´ 1) ą χ2crit (k ´ 1), reject H0 .
Question: How do you carry out White’s test with the model
Yi = β1 + β2 X2i + β3 X3i + β4 X4i + ui ?

Feng Li (Statistics, CUFE) Econometrics 16 / 22


Example of White’s test

Consider the following regression model with 41 observations,

lnYi = β1 + β2 lnX2i + β3 lnX3i + ui

where Y = ratio of trade taxes (import and export taxes) to total government
revenue, X2 = ratio of the sum of exports plus imports to GNP, and X3 =
GNP per capita.
By applying White’s heteroscedasticity test. We first obtain the residuals
from regression.
Then we do the following auxiliary regression

û2i = ´5.8+2.5 ln X2i +0.69 ln X3i ´0.4(ln X2i )2 ´0.04(ln X3i )2 +0.002 ln X2i ln X3i

and R2 = 0.1148.
Can you compute the white’s test statistic?
What is your conclusion of heteroscedasticity? (The 5% critical value of
χ2df=5 = 11.07, and the 10% critical value of χ2df=5 = 9.24)

Feng Li (Statistics, CUFE) Econometrics 17 / 22


Detecting heteroscedasticity (5)
ï Goldfeld-Quandt test
It is popular to assume σ2i is positively related to one of the covariates e.g.
σ2i = σ2 X22i in a three covariates model.
The bigger Xi we have, the bigger σ2i is.
H0 : Homoscedasticity
step 1: Sort covariates with the order of X2i
step 2: Delete the c central observations and dived the remaining parts into
two groups.
step 3: Fit the two groups separately with OLS and obtain SSE1
RSS1 (for the
small values group) and RSS
SSE22 (for the large values group) with both
(n ´ c)/2 ´ k degrees of freedom. Why?
step 4: Compute the ratio

RSS2 /[(n
SSE2 ´ c)/2 ´ k]
λ= „ F(((n ´ c)/2 ´ k), ((n ´ c)/2 ´ k))
RSS1 /[(n
SSE1 ´ c)/2 ´ k]

Reject H0 if λ ą Fcrit (((n ´ c)/2 ´ k), ((n ´ c)/2 ´ k)).

Feng Li (Statistics, CUFE) Econometrics 18 / 22


Detecting heteroscedasticity (6)
ï Breusch-Pagan-Godfrey test

Consider the model Yi = β1 + β2 X2i + ... + +βk Xki + ui


Assume that σ2i = α1 + α2 Z2i + ... + αm Zmi where Zi are known variables
which can be Xi .
If there no heteroscedasticity, then α2 = ... = αm = 0 and σ2i = α1 .
step 1: Obtain û1 , ..., ûn by the model.
ř
step 2: Obtain σ̃2 = û2i /n.
step 3: Construct variable pi = û2i /σ̃2
step 4: Regress pi = α1 + α2 Z2i + ... + αm Zmi + vi
step 5: Obtain
ESS/2
SSR „ χ2 (m ´ 1)
ESS/2 ą χ2crit (m ´ 1).
Evidence of heteroscedasticity when SSR

Feng Li (Statistics, CUFE) Econometrics 19 / 22


Detecting heteroscedasticity (7)
ï Spearman’s rank correlation test
1 The null hypothesis: Heteroscedasticity
2 Obtain the residuals ûi from the regression.
3 Rank |ûi | and Xi (or Yi )
4 Compute the Spearman’s rank correlation coefficients
ř 2
di
rs = 1 ´ 6
n(n2 ´ 1)
where di are the differences of |ui | and Xi in the ranked order and n is
number of individuals.
5 The significance of the sample rs can be tested by the t test as
?
rs n ´ 2
tobs =
1 ´ r2s
6 Decision rule: if tobs ą tcritical , accept H0 . Otherwise, there is no
heteroscedasticity. Multiple regressors should repeat this procedure multiple
times.
Feng Li (Statistics, CUFE) Econometrics 20 / 22
How to obtain estimators
ï with Yi = β1 + β2 Xi + ui when E(ui ) = 0 and Var(ui ) = σ2i
When σi is known: use WLS method to obtain BLUE estimators. pp. 4–5
When σi is not known:
§ If V(ui ) = σ2 X2i , do OLS with model
Yi 1 ui
= β2 + β1 +
Xi Xi Xi
and Var(ui /Xi ) = σ2 . Why?
§ If Var(ui ) = σ2 Xi (Xi ą 0), do OLS with model
Y 1 u
? i = β2 + β1 ? + ? i
Xi Xi Xi
?
and Var(ui / Xi ) = σ2 . Why?
§ If Var(ui ) = σ2 [E(Yi )]2 (Xi ą 0), do OLS with model
Yi 1 ui
= β2 + β1 +
Yˆi Yˆi Yˆi
and Var(ui /Yˆi ) « Var(ui )/[EYˆi ]2 = Var(ui )/Yi 2 = σ2 .
§ Do OLS with log transformed data lnYi = β1 + β2 lnXi + vi can also reduce
heteroscedasticity.
Feng Li (Statistics, CUFE) Econometrics 21 / 22
L9: Autocorrelation

Feng Li
feng.li@cufe.edu.cn

School of Statistics and Mathematics


Central University of Finance and Economics
Introduction
In the classic regression model we assume cov(ui , uj |xi , xk ) = E(ui , uj ) = 0
What if we break the assumption? Look at these patterns

Feng Li (SAM.CUFE.EDU.CN) Econometrics 2 / 16


Positive and negative autocorrelation

Feng Li (SAM.CUFE.EDU.CN) Econometrics 3 / 16


What happens to OLS/GLS when autocorrelation exists?

Consider a simple model Yt = β1 + β2 Xt + ut where the error terms is


autocorrelated as ut = ρut´1 + ǫt
The OLS estimator will not change (still linear unbiased)
But is not efficient since
σ2 σ2
 ř ř 
  xt xt´1 xt xt´2
Var β̂2 = ř 2 1 + 2ρ ř 2 + 2ρ2 ř 2 + ... ą ř 2
xt xt xt xt

where σ2 / x2t is the variance of β̂2 when no autocorrelation presents.


ř

The GLS estimators under autocorrelation is BLUE.

Feng Li (SAM.CUFE.EDU.CN) Econometrics 4 / 16


Consequences of using OLS when autocorrelation

The confidence intervals are likely wider than those from GLS.

The usual t and F tests are not valid.


The residual variance σ̂2 likely to underestimate the true σ2
Likely to overestimate R2

Feng Li (SAM.CUFE.EDU.CN) Econometrics 5 / 16


Detection of autocorrelation
ï The runs test (nonparametric test)
step 1: Plot the residuals as follows

0.4
0.2
Residuals

0.0
−0.2
−0.4

0 2 4 6 8 10 12

Time

step 2: Count the runs as: (- -)(+ + +)(-)(+)(- - -)


step 3: N1 = number of ”+” residuals = 4,
N2 = number of ”-” residuals = 6, N = N1 + N2 = 10,
R = number of runs = 5
step 3: The number of runs is normally distributed
 
2N1 N2 2N1 N2 (2N1 N2 ´ N)
R„N + 1,
N N2 (N + 1)
Feng Li (SAM.CUFE.EDU.CN) Econometrics 6 / 16
Detection of autocorrelation
ï The runs test – example

Carry out the runs test with the residuals give in the previous slides.
What critical value are you look for?
Why this test is called nonparametric test?

Feng Li (SAM.CUFE.EDU.CN) Econometrics 7 / 16


Detection of autocorrelation
ï Durbin–Watson d test (1)

The Durbin–Watson d statistic

(ût ´ ût´1 )2
řn
t=2ř
d= n 2
t=1 ût

Assumptions underlying the d statistic


§ The regression model includes the intercept. If you model does not have the
intercept, rerun the model include intercept to obtain ûi .
§ It can only determine first-order autoregressive scheme.
§ The error term are assumed normally distributed.
§ The explanatory variables do not contains lagged values (we will talk about it
more in the time series part).

Feng Li (SAM.CUFE.EDU.CN) Econometrics 8 / 16


Detection of autocorrelation
ï Durbin–Watson d test (2)

Feng Li (SAM.CUFE.EDU.CN) Econometrics 9 / 16


Detection of autocorrelation
ï Durbin–Watson d test (3)

Approximation of d statistic

d « 2(1 ´ ρ̂)
ř
ût ût´1
where ρ̂ = ř 2
ût
is the sample first-order coefficient of autocorrelation of
ût , i.e.,
ût = ρ̂ut´1 .
When ρ Ñ 1, positive autocorrelation;
When ρ Ñ ´1, negative autocorrelation;
When ρ Ñ 0, no autocorrelation.

Feng Li (SAM.CUFE.EDU.CN) Econometrics 10 / 16


Detection of autocorrelation
ï Durbin–Watson d test – example

Given a sample of 100 observations and 5 explanatory variables, what can


you say about autocorrelation if d = 1.2?
Can handle without looking at the table?

Feng Li (SAM.CUFE.EDU.CN) Econometrics 11 / 16


Testing for autocorrelation

In R: dwtest() implements an exact procedure for computing the


p value (for Gaussian data) and also provides a normal approximation
for sufficiently large samples (both depending on the regressor matrix
X ).

R> dwtest(consump1)
Durbin-Watson test

data: consump1
DW = 0.087, p-value <2e-16
alternative hypothesis: true autocorrelation is greater than 0

Interpretation: Highly significant positive autocorrelation, which


confirms the results from Chapter 3.

Christian Kleiber, Achim Zeileis © 2008–2017Applied Econometrics with R – 4 – Diagnostics and Alternative Methods of Regression – 41 / 86
The comparison between Durbin–Watson test and runs test

Runs test does not require and probability distribution of the error term.
Warning: The d test is not valid if ui is not iid.
When n is large, ?
n(1 ´ d/2) „ N(0, 1)
We can use the normality approximation when n is large regardless of iid.
Durbin–Watson statistic requires the covariates be non stochastic which is
difficult to meet in econometrics.
In this case, try the test on next slides.

Feng Li (SAM.CUFE.EDU.CN) Econometrics 12 / 16


Detection of autocorrelation
ï The Breusch–Godfrey test (Lagrange Multiplier test)

Consider the model Yt = β1 + β2 Xt + ut ,


Assume ut = ρ1 ut´1 + ρ2 ut´2 + ... + ρp ut´p + ǫt
H0 : ρ1 = ρ2 = ... = ρp = 0, i.e. no autocorrelation.
Run the auxiliary model

ût = α1 + α2 Xt + ρ1 ût´1 + ρ2 ût´2 + ... + ρp ût´p + ǫt

and obtain R2 .
When n large,
(n ´ p)R2 „ χ2 (p).
Reject H0 if χ2obs (p) ą χ2crit (p).
Question: Have you seen similar another test that has the similar way of
constructions as this one?

Feng Li (SAM.CUFE.EDU.CN) Econometrics 13 / 16


Testing for autocorrelation

In R: bgtest() implements both versions.

Default: Use order p = 1.

R> bgtest(consump1)
Breusch-Godfrey test for serial correlation of order up
to 1

data: consump1
LM test = 190, df = 1, p-value <2e-16

Christian Kleiber, Achim Zeileis © 2008–2017Applied Econometrics with R – 4 – Diagnostics and Alternative Methods of Regression – 45 / 86
Model misspecification and pure autocorrelation

Some variables that were supposed to be in the model but not.


This is the case of excluding variable, which is a type of model specification
bias (will talk more in next chapter).
This may also show patterns in the residuals plot.
Try to find out if it is pure autocorrelation or misspecification.
See example on p. 441.

Feng Li (SAM.CUFE.EDU.CN) Econometrics 14 / 16


Use GLS to correct pure autocorrelation

Assume you have model Yt = β1 + β2 Xt + ut and


there is pure autocorrelation ut = ρut´1 + ǫt .
When ρ known
§ From the model we also have Yt´1 = β1 + β2 Xt´1 + ut´1
§ Then Yt ´ ρYt´1 = β1 (1 ´ ρ) + β2 (Xt ´ ρXt´1 ) + (ut ´ ρut´1 ). Why?
§ The above model can be written as Yt˚ = β˚ ˚ ˚
1 + β2 Xt + ǫt which removes
autocorrelation. Why?
When ρ not known
§ One may run the model Yt ´ Yt´1 = β2 (Xt ´ Xt´1 ) + (ut ´ ut´1 )
§ We will talk more at second part of this course.
Time Series (AutoRegressive Model)

Feng Li (SAM.CUFE.EDU.CN) Econometrics 15 / 16


Lecture 6
Regression Diagnostics

STAT 512
Spring 2011

Background Reading
KNNL: 3.1-3.6

6-1
What Do We Need To Check?
 Main Assumptions: Errors are independent,
normal random variables with common
variance  2

 Does the assumption of linearity make


sense?

 Were any important predictors excluded


from the model?

6-3
What Do We Need To Check?
 Are there “outlying” values for the predictor
variables (X) that could unduly influence
the regression model?

 Are there outliers? (Generally the term


outlier refers to a response that is vastly
different from other responses (Y) – see
KNNL pg 108)

 How to Get Started? - Look at the Data!


6-4
Diagnostics for Predictors (X)
 We do not make any specific assumptions
about X. However, understanding what is
going on with X is necessary to
interpreting what is going on with the
response (Y).
 So, we can look at some basic summaries of
the X variables to get oriented.
 However, we are not checking our
assumptions at this point.

6-5
Diagnostics for Predictors (X)
 Dot plots, Stem-and-leaf plots, Box plots,
and Histograms can be useful in
identifying potential outlying observations
in X. Note that just because it is an
outlying observation does not mean it will
create a problem in the analysis. However
it is a data point that will probably have
higher influence over the regression
estimates.
 Sequence plots can be useful for identifying
potential problems with independence.
6-6
Reminder – Scatterplot

6-8
UNIVARIATE Procedure (2)
Stem Leaf # Boxplot
78 00000 5 |
76 0000 4 |
74 0 1 |
72 000 3 |
70 000 3 +-----+
68 000 3 | |
66 0 1 | |
64 0000 4 | |
62 000 3 | |
60 0000 4 *--+--*
58 000 3 | |
56 0000 4 | |
54 000 3 | |
52 000 3 | |
50 0 1 | |
48 00 2 +-----+
46 0000 4 |
44 00 2 |
42 0000 4 |
40 000
----+----+----+----+

6-10
UNIVARIATE Procedure (3)

6-11
Diagnostics for Residuals (1)
 Basic Distributional Assumptions on Errors

 Model: Yi = β0 + β1Xi + εi
o Where   (i.e., the εi are
i ~ N 0, 2
iid

independent, normal, and have constant


variance).
 The ei (residuals) should be similar to the εi

 How do we check this? Plot the Residuals!


6-14
Diagnostics for Residuals (2)
 Basic Questions addressed by diagnostics
for residuals
o Is the relationship linear?
o Does the variance depend on X?
o Are the errors normal?
o Are the errors independent?
o Are their outliers?
o Are any important predictors omitted?

6-15
! " # $ %
! &
' ( # $ %
( &
' ( ) # $ %
* + ( &
, - ,) # $ %
&
. $' / . 0 1 $
- -
2 / + -

* % -- #0 3 $ -- % / )
$ 0 3 4!,(4&
Checking Linearity

 Plot Y vs. X (scatterplot)


 Plot e vs X (or Yˆ ) - residual plot
 Generally can see from a scatter plot when a
relationship is nonlinear
 Patterns in residual plots can emphasize
deviations from linear pattern

6-16
Checking Constant Variance
 Plot e vs X (or Yˆ ) - residual plot
 Patterns suggest issues!
 Megaphone shape indicates
increasing/decreasing variance with X
 Other shapes can indicate non-linearity
 Outliers show up in obvious way

6-17
6-19
6-20
6-21
6-22
9 - 7 - $

9 - 6 - -

. 0 $ :) ;
- 6

8
Checking for Normality
 Plot residuals in a Normal Probability Plot
o Compare residuals to their expected value
under normality (normal quantiles)
o Should be linear IF normal
 Plot residuals in a Histogram
 PROC UNIVARIATE is used for both of
these
 Book shows method to do this by hand –
you do not need to worry about having to
do that.
6-23
6-25
6-26
Normality Plot
 Outliers show up in a quite obvious way.
 Non-normal distributions can look very
wacky.
 Symmetric / Heavy tailed distributions show
an “S” shape.
 Skewed distributions show exponential
looking curves (see figure 3.9)

6-27
6-28
6-29
150000

100000

R 50000
e
s
i
d
u
a
0
l

- 50000

- 100000

-4 -3 -2 -1 0 1 2 3 4

Nor mal Quant i l es

6-30
Checking Independence

 Sequence Plot: Residuals against time/order

 Patterns suggest non-independence

 See figure 3.8 in KNNL.

6-31
Additional Predictors

 Plot residuals against other potential


predictors (not predictors from the model)

 Patterns indicate an important predictor that


maybe should be in the model.

 Example: Suppose we use a muscle mass


dataset that includes both men and women.

6-32
6-33
Residuals vs Age
 Plot looks great, right?
 But what happens if we separate male and
female?

PROC GPLOT data=diag;


plot resid*age=gender /overlay;
RUN;

6-34
6-35
Summary of Diagnostic Plots
 You will have noticed that the same plots are
used for checking more than one assumption.
These are your basic tools.
o Plot Y vs. X (check for linearity, outliers)
o Plot Residuals vs. X (check for constant
variance, outliers, linearity)
o Normal Probability Plot and/or
Histogram of residuals (normality, outliers)
 If it makes sense, consider also doing a
sequence plot of the residuals (independence)

6-37
Plots vs. Significance Tests
 If you are uncertain what to conclude after
examining the plots, you may additionally wish
to perform hypothesis tests for model
assumptions (normality, homogeneity of
variance, independence).
 These tests are not a replacement for the plots,
but rather a supplement to them.
 Note of caution: Plots are more likely to
suggest a remedy and significance test results
are very dependent on sample size.

6-38
Significance Tests for
Model Assumptions

 Constancy of Variance:
o Brown-Forsythe (modified Levene)
o Breusch-Pagan
 Normality
o Kolmogorov-Smirnov, etc.
 Independence of Errors:
o Durbin-Watson Test
6-39
Tests for Normality
PROC UNIVARIATE data=diag normal;
var resid;

Tests for Normality

Test --Statistic--- -----p Value------

Shapiro-Wilk W 0.979585 Pr < W 0.4112


Kolmogorov-Smirnov D 0.079433 Pr > D >0.1500
Cramer-von Mises W-Sq 0.057805 Pr > W-Sq >0.2500
Anderson-Darling A-Sq 0.383556 Pr > A-Sq >0.2500

 Small p-values indicate non-normality

6-40
• # $ %
• & ' (
) ) *
• & ' ! !
!) ) *
o + ! , !
&
o ! ! !

"
!
" # ! $ %&'
• . ((
) ! ( ! '
' *

%/ ! !)
, ! 0
• 1 2 3' ****
# ! 4 5
! !

-
• ε ∼ /7' σ 0 ' 8
ε −7
∼ /7' 0 2 9 3
σ
• . ( '( 1 : (
σ' ( *
• # '
*
• '
! ! ( (
*
6
• 2< : ,3

{ }= ( − )
• # ! !
! ,' ( 7
• = !
' *


= ∼ −
( − )
;
• > ! ?

) <
• ! ( !) @
8
= − ()

• 5 8 A ! !)
***
= ( − )
! "#$

()
{ }=

• = =
{ } () ( − )

• A ( C ( ? ?
! ( : (
( 2 , ! 3*
B
"%$

• > !
!
− −
=
( − )−

• > ' ! '


! < :

D
& '
• E 9 9
<

• ( : ( C '
9 '
! " ! / * 0(
* > ( !
( "! )
/ 0*
• < ! *
7
' !( )

• E !
! *

• 2 3) ) ) !
! !
) !) (
!
!) 2 , ! 3*
* +
• C !
*
• E F ?
! 7*76
• !) 9 /
0 C
) ' ? ?
!
• 1 !
*
!! " " # !$ %&&&&&&% % $"
# "! ! '! ! $ "$ ! %&&&&&&% % !
! $ #! ! ! " % &&&% % #
' $ "$ # ! '" # ' % %& % "
$ $ ' ! !% %& % $
" $ '# $ ' !$' ! " " % %&&&&&&% " '
$ ! # '" " % %& %
$ $ $ ' " " % &% % '
# $ ' $ ! !# % %& %
$ '" ## ' '$ "!! % %& %

5 8 H &>' H :' ; H I

6
' 3
$
( * C
F ! ! ,*
' ≤ ≤ ∑ =
&
! $ *
&
( ! ' *
&
, ) /) -77 I55&0*
;
! ! ( !) ) H7*7"
( ) * +
, - + # !$ ' ! '#
! $ !
! ) ! ! $ ! ! '
' , * + ' $
'! . #! #'
'!$ / 0 1 * ' '
'!# 2 + 3 ! " !!$
'' 4 + ! " " '! ! !

B
/
5
! * ! !
8
o !

o !
(
o !
)

D
, 45
> ) >&&
)
! >
&
/ !)
A/)' )0 0*

7
**
> ) C
#5 )
! >
@ ! (
) ( (
!)
& / '
0

( )
*6

) ) ! )
)
>
) !
>
7/ !!!

5 !) , )
F 1 : (

5
(
'
9 * , ' :
, +'
%! ,
! #
;, ' < +* , '
* :
* ! ! #

8
> *
"
$
? $
*@ !
A
$

=
$
( # %
' ! ! '
! #

= = −

; % < # 9 * , + *
) *+ !
, ' 0 , %
, ' ? * #
$
( # % #$%

2 ! * '
, ' # + *
' 3' 4#

% < ! 7 ,
. 0 /, ' *
$#
$
) * ( # %

< , ' '


, ' "
 − 
= −  
 − 

$ ! ' +
' '
$+
, ' #
$
) * ( # % #$%

*
= − ( −)
= −
% < ? + 7 ,
< $+ *
' #
, ,
! < #

(
)-
) ) @
 
 +
=  
A * <@ A
 
 +
=  
( )

C < 3' 4#
> % !
3! 4 #
D ' + #
6
@ $! /% ! $ −
I
# 2% 0
/ $ 46 <5. $%
% ! $ 0
' # . (
/$ 4 $%
## 5I # $
% ! / !
$ ( ! $ 0
D/ / # ! #
B
( ) &

7 ! ! / 9 #
% / 0
3 2 $% . ( #
! ! / 9 . #
0
3 $% ./ / < 6J
! I #$ ! #
K 2 % # $ 0

E
( ) & "*%

C / $ / $
I // %
// $ 0

# #
/ 4 0#0 % 0 % 5

D/ # L
: #$ 0
J
+
C / $ 0

. !
4 / $5 $ 0

% ! !
4! %
50
+ "*%

L' $# % /
! ! .
$ # !
0

" /
$ ! !
. ! . #
$ ! 0
+ ",%
$!
/ ! (
$ . ( #/ .
$ . $ ! !
$ 0
) % K
# / 4 5/
$ 0 D .
2 0

+
@ $! /% ! $ −
I
# 2% 0
/ $ 46 <5. $%
% ! $ 0
' # . (
/$ 4 $%
## 5I # $
% ! / !
$ ( ! $ 0
D/ / # ! #
B
( ) &

7 ! ! / 9 #
% / 0
3 2 $% . ( #
! ! / 9 . #
0
3 $% ./ / < 6J
! I #$ ! #
K 2 % # $ 0

E
( ) & "*%

C / $ / $
I // %
// $ 0

# #
/ 4 0#0 % 0 % 5

D/ # L
: #$ 0
J
+
C / $ 0

. !
4 / $5 $ 0

% ! !
4! %
50
+ "*%

L' $# % /
! ! .
$ # !
0

" /
$ ! !
. ! . #
$ ! 0
+ ",%
$!
/ ! (
$ . ( #/ .
$ . $ ! !
$ 0
) % K
# / 4 5/
$ 0 D .
2 0

+
0

$ # I# !
/ %
' % .& ) ! ! 4
# 5 $% ! ;
$ 0
3 : /
/
! $ 0

B
1 -

( ) $ $ #
& ) !

/ ! $ .& )$ !
/ ! $% %
$ 0

E
@ $! /% ! $ −
I
# 2% 0
/ $ 46 <5. $%
% ! $ 0
' # . (
/$ 4 $%
## 5I # $
% ! / !
$ ( ! $ 0
D/ / # ! #
B
( ) &

7 ! ! / 9 #
% / 0
3 2 $% . ( #
! ! / 9 . #
0
3 $% ./ / < 6J
! I #$ ! #
K 2 % # $ 0

E
( ) & "*%

C / $ / $
I // %
// $ 0

# #
/ 4 0#0 % 0 % 5

D/ # L
: #$ 0
J
+
C / $ 0

. !
4 / $5 $ 0

% ! !
4! %
50
+ "*%

L' $# % /
! ! .
$ # !
0

" /
$ ! !
. ! . #
$ ! 0
+ ",%
$!
/ ! (
$ . ( #/ .
$ . $ ! !
$ 0
) % K
# / 4 5/
$ 0 D .
2 0

+
0

$ # I# !
/ %
' % .& ) ! ! 4
# 5 $% ! ;
$ 0
3 : /
/
! $ 0

B
1 -

( ) $ $ #
& ) !

/ ! $ .& )$ !
/ ! $% %
$ 0

E
5

• $ /
o @
o / $ 2%
o " 4 5 $%

• # % / .! /
% / ! 0
5 "*%

• & K )
F

∑4 − S5
=
= F

o ? / %
o S ?% /
! $ $ ! #
0
F
o ? $! / !0
A
5 ",%
• C $ % / .
$% & 0
• /& / & ) ##
$ ! 0
• /& $ # & ).
## &
& ) /
$ % / 0
• E0 / $ / $ 0

You might also like