0% found this document useful (0 votes)
2 views

Multiple Regression

The document outlines the process and methodology of multivariate regression analysis, focusing on the effect of independent variables on a dependent variable. It details the steps involved in regression analysis, including determining the aim, estimating coefficients, specification testing, and checking assumptions. Additionally, it provides an example using data from Massachusetts schools to illustrate the application of regression analysis in measuring educational outcomes.

Uploaded by

abdou4334
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Multiple Regression

The document outlines the process and methodology of multivariate regression analysis, focusing on the effect of independent variables on a dependent variable. It details the steps involved in regression analysis, including determining the aim, estimating coefficients, specification testing, and checking assumptions. Additionally, it provides an example using data from Massachusetts schools to illustrate the application of regression analysis in measuring educational outcomes.

Uploaded by

abdou4334
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 69

• Faculty of Economics

• Institute of Economic Theory and Methodology

Multivariate regression analysis

Quantitative Statistical Methods


Levente Lengyel
• Faculty of Economics
• Institute of Economic Theory and Methodology

Aim of regression analysis


• Measure the independent(s) variable’s (X)
effect on the dependent variable (Y)
– How will change the demand of tobacco, if the tax
increase?
– How will change the tax income of a state, if the
gasoline price decrease?
– What is the effect on the student's grade, if the
class size increase?
• Faculty of Economics
• Institute of Economic Theory and Methodology

Steps of regression analysis


1. • Determine the aim of analysis

2. • Estimate of coefficients

3. • Specification testing

4. • Check assumptions

5. • Validation

6. • Draw conclusion
• Faculty of Economics
• Institute of Economic Theory and Methodology

Data
Observations: Schools in Massachusetts (US)
Number of observations: 220
Source: Stock, James H. and Mark W. Watson (2003) Introduction to Econometrics, Addison-Wesley Educational Publishers
Variables:
• District: Name of district (coded)
• Municipality: name
• Spending per pupil, regular: thousand $
• Spending per pupil, special needs: thousand $
• Spending per pupil, bilingual: thousand $
• Spending per pupil, occupational: ezer $
• Spending per pupil, total: thousand $
• Students per computer
• Share of special education students (%)
• Share of receiving lunch subsidy (%)
• Students per teacher
• Average district per capita income: thousand $
• 4th grade score (math+english+science)
• 8th grade score (math+english+science)
• Average teacher salary: thousand $
• Share of english learners (%)
• Faculty of Economics
• Institute of Economic Theory and Methodology

SPSS
• Faculty of Economics
Exploratory data analysis
• Institute of Economic Theory and Methodology

Változók megismerése
• Faculty of Economics
• Institute of Economic Theory and Methodology

Multivariate regression
Population equation:
𝒀 = 𝜷𝟎 + 𝜷𝟏 ∗ 𝑿𝟏 + 𝜷𝟐 ∗ 𝑿𝟐 + ⋯ + 𝜷𝒑 ∗ 𝑿𝒑 + 𝜺
Where,
𝛽0 : constant, If 𝑋1 , 𝑋2 , … , 𝑋𝑝 = 0 then 𝑌 = 𝛽0 + 𝜀
𝛽𝑝 : slope of line
It shows us the effect on 𝑌 (the expected change in
𝑌) of a unit change in 𝑋𝑝 , if the other independent
variables (X) remain constant (ceteris paribus).
• Faculty of Economics
• Institute of Economic Theory and Methodology

1. Aim of analysis
• Measure the effect of variables on 8th grade score (Y = totsc8):
– Students per teacher (𝑋1 = tchratio)
– Spending per pupil, regular (𝑋2 = regday)
– Spending per pupil, total (𝑋3 = 𝑡𝑜𝑡𝑑𝑎𝑦)
– Average teacher salary (𝑋4 = 𝑎𝑣𝑔𝑠𝑎𝑙𝑎𝑟𝑦)
– Share of special education students (%) (𝑋5 = 𝑠𝑝𝑒𝑐𝑒𝑑)
– Average district per capita income (𝑋6 = percap)
• Defining the specification:
Population equation (2):

𝒕𝒐𝒕𝒔𝒄𝟖 = 𝜷𝟎 + 𝜷𝟏 ∗ 𝒕𝒄𝒉𝒓𝒂𝒕𝒊𝒐 + 𝜷𝟐 ∗ 𝒓𝒆𝒈𝒅𝒂𝒚 + 𝜷𝟑 ∗ 𝒕𝒐𝒕𝒅𝒂𝒚 + 𝜷𝟒 ∗ 𝒂𝒗𝒈𝒔𝒂𝒍𝒂𝒓𝒚 + 𝜷𝟓 ∗ 𝒔𝒑𝒆𝒄𝒆𝒅 + 𝜷𝟔 ∗ 𝒑𝒆𝒓𝒄𝒂𝒑 + 𝜺


• Faculty of Economics
• Institute of Economic Theory and Methodology

2. Estimation of coefficients
• In order to validate, before the estimation
share the full sample into 2 parts.
– 70% - training part, we make estimation in this
part
– 30% - test part, to validate the results
• It is prepared in this dataset: partition
• Faculty of Economics
• Institute of Economic Theory and Methodology

Transform/Compute Variable
RV.BINOM(1,0.7) -> The new variable’s value will be 1 with 70%
probability

Value labels:
1 – Training (≈70%)
0 – Test (≈30%)
• Faculty of Economics
• Institute of Economic Theory and Methodology

Analyze/Regression/Linear
• Faculty of Economics
• Institute of Economic Theory and Methodology

3. Specification testing
Adj. R2: The
independent variables
(X) explain the
dependent (Y)
variable’s variance in
69.2%.

Standard Error of the


R: There is a strong Estimate:
dependence between The estimated values
the dependent (Y) and differ from the
independent variables measured values with
(X). 11.77 points on average.
• Faculty of Economics
• Institute of Economic Theory and Methodology

SSR: Sum Squares of


Regression

SSE: Sum Squares of


Errors

SST: Sum Squares of


Total

𝑯𝟎 : 𝜷𝟏 = 𝜷𝟐 = ⋯ = 𝜷𝒑 = 𝟎
𝑯𝟏 : ∃ 𝜷𝒑 ≠ 𝟎

𝑺𝒊𝒈 < 𝟓%, thus we can reject the H0, it means, that there
is a 𝜷 which not equal to 0. Therefore there is relationship
between the dependent variable and the independent
variables.
• Faculty of Economics
• Institute of Economic Theory and Methodology

The effect of Student per 𝑯𝟎 : 𝜷𝒑 = 𝟎


teacher on Y decreased, 𝑯𝟏 : 𝜷𝒑 ≠ 𝟎
also the standard error of
Estimated values of
the coefficient! If 𝑺𝒊𝒈 > 𝟓%, we keep
coefficients
H0, it means no effect
on Y. We should
remove the variable
from our model.
But one by one!

Remove from the


model!

Population equation (2):


𝒕𝒐𝒕𝒔𝒄𝟖 = 𝜷𝟎 + 𝜷𝟏 ∗ 𝒕𝒄𝒉𝒓𝒂𝒕𝒊𝒐 + 𝜷𝟐 ∗ 𝒓𝒆𝒈𝒅𝒂𝒚 + 𝜷𝟑 ∗ 𝒕𝒐𝒕𝒅𝒂𝒚 + 𝜷𝟒 ∗ 𝒂𝒗𝒈𝒔𝒂𝒍𝒂𝒓𝒚 + 𝜷𝟓 ∗ 𝒔𝒑𝒆𝒄𝒆𝒅 + 𝜷𝟔 ∗ 𝒑𝒆𝒓𝒄𝒂𝒑 + 𝜺
• Faculty of Economics
• Institute of Economic Theory and Methodology

New population equation (3):


𝒕𝒐𝒕𝒔𝒄𝟖 = 𝜷𝟎 + 𝜷𝟏 ∗ 𝒕𝒄𝒉𝒓𝒂𝒕𝒊𝒐 + 𝜷𝟐 ∗ 𝒓𝒆𝒈𝒅𝒂𝒚 + 𝜷𝟑 ∗ 𝒕𝒐𝒕𝒅𝒂𝒚 + 𝜷𝟒 ∗ 𝒔𝒑𝒆𝒄𝒆𝒅 + 𝜷𝟓 ∗ 𝒑𝒆𝒓𝒄𝒂𝒑 + 𝜺
• Faculty of Economics
• Institute of Economic Theory and Methodology

Adj. R2: Despite leaving the


variable, the explanatory
power of the model did not
decrease significantly.

Check again the


significance levels!

𝑺𝒊𝒈 > 𝟓%,


therefore remove
from the model!
Population equation (3):
𝒕𝒐𝒕𝒔𝒄𝟖 = 𝜷𝟎 + 𝜷𝟏 ∗ 𝒕𝒄𝒉𝒓𝒂𝒕𝒊𝒐 + 𝜷𝟐 ∗ 𝒓𝒆𝒈𝒅𝒂𝒚 + 𝜷𝟑 ∗ 𝒕𝒐𝒕𝒅𝒂𝒚 + 𝜷𝟒 ∗ 𝒔𝒑𝒆𝒄𝒆𝒅 + 𝜷𝟓 ∗ 𝒑𝒆𝒓𝒄𝒂𝒑 + 𝜺
• Faculty of Economics
• Institute of Economic Theory and Methodology

New population equation (4):


𝒕𝒐𝒕𝒔𝒄𝟖 = 𝜷𝟎 + 𝜷𝟏 ∗ 𝒕𝒄𝒉𝒓𝒂𝒕𝒊𝒐 + 𝜷𝟐 ∗ 𝒓𝒆𝒈𝒅𝒂𝒚 + 𝜷𝟑 ∗ 𝒔𝒑𝒆𝒄𝒆𝒅 + 𝜷𝟒 ∗ 𝒑𝒆𝒓𝒄𝒂𝒑 + 𝜺
• Faculty of Economics
• Institute of Economic Theory and Methodology

Adj. R2: Despite


leaving the variable,
the explanatory power
of the model
increased.

Every significance level


is under 5%!

This iterative process


can be made with
Backward, Stepwise,
or Forward method.

Population equation (4): 𝒕𝒐𝒕𝒔𝒄𝟖 = 𝜷𝟎 + 𝜷𝟏 ∗ 𝒕𝒄𝒉𝒓𝒂𝒕𝒊𝒐 + 𝜷𝟐 ∗ 𝒓𝒆𝒈𝒅𝒂𝒚 + 𝜷𝟑 ∗ 𝒔𝒑𝒆𝒄𝒆𝒅 + 𝜷𝟒 ∗ 𝒑𝒆𝒓𝒄𝒂𝒑 + 𝜺


• Faculty of Economics
• Institute of Economic Theory and Methodology

4. Check assumptions
• Assumptions of the error term
• Assumptions of the independent variables
• Faculty of Economics
• Institute of Economic Theory and Methodology

Assumptions of Standard Multiple


Linear Regression
• Assumptions of error term:
– 𝐸 𝜀|𝑋 = 𝑥 = 0
Expected value of residuals is 0.
– 𝑉𝑎𝑟 𝜀 = 𝜎 2
Constant variance of residuals, homoskedasticity.
– 𝑋𝑖 , 𝑌𝑖 , 𝑖 = 1, … , 𝑛, 𝑖. 𝑖. 𝑑
The error term is uncorrelated across observations. (autocorrelation)
– 𝜀~𝑁(0, 𝜎 2 )
The residuals follow normal distribution.
• Faculty of Economics
• Institute of Economic Theory and Methodology

Assumptions of Standard Multiple


Linear Regression
• Assumptions of the independent variables (X):
– Linear independency (multicollinearity)
– Fix values, which do not change sample by sample
– There is no scale error
– The independent variable is uncorrelated with the
error term
• Faculty of Economics
• Institute of Economic Theory and Methodology

Setups for testing assumptions


Checking
Analyze/Regression/Linear multicollinearity

Testing
autocorrelation

Inspection of
homoskedasticity

Checking the normal


distribution
• Faculty of Economics
• Institute of Economic Theory and Methodology

Assumptions of Standard Multiple


Linear Regression
• Assumptions of error term:
– 𝐸 𝜀|𝑋 = 𝑥 = 0
Expected value of residuals is 0.
– 𝑉𝑎𝑟 𝜀 = 𝜎 2
Constant variance of residuals, homoskedasticity.
– 𝑋𝑖 , 𝑌𝑖 , 𝑖 = 1, … , 𝑛, 𝑖. 𝑖. 𝑑
The error term is uncorrelated across observations. (autocorrelation)
– 𝜀~𝑁(0, 𝜎 2 )
The residuals follow normal distribution.
• Faculty of Economics
• Institute of Economic Theory and Methodology

𝐸 𝜀|𝑋 = 𝑥 = 0
• The conditional distribution of error given X
has a mean of zero.
• In most cases you can proof it with logical
interpretation.
• If we estimate the coefficients with OLS the
average residual will be 0.
• Faculty of Economics
• Institute of Economic Theory and Methodology

Assumptions of Standard Multiple


Linear Regression
• Assumptions of error term:
– 𝐸 𝜀|𝑋 = 𝑥 = 0
Expected value of residuals is 0.
– 𝑉𝑎𝑟 𝜀 = 𝜎 2
Constant variance of residuals, homoskedasticity.
– 𝑋𝑖 , 𝑌𝑖 , 𝑖 = 1, … , 𝑛, 𝑖. 𝑖. 𝑑
The error term is uncorrelated across observations. (autocorrelation)
– 𝜀~𝑁(0, 𝜎 2 )
The residuals follow normal distribution.
• Faculty of Economics
• Institute of Economic Theory and Methodology

𝑉𝑎𝑟 𝜀 = 𝜎 2

• The variance of residuals is constant, it is called homoskedasticity.


• Checking:
– Plots: Scatter plot of 𝑒𝑖 = (𝑌𝑖 − 𝑌෡𝑖 ) residuals against 𝑌෡𝑖 or 𝑋𝑝𝑖
– Statistical tests:
• Goldfeld-Quandt test
• White test
• Dealing with heteroskedasticity:
– Rebuild the model
– Add new variables to the model
– Use the logarthim of variables
– Calculate White’s standard error (Not available in SPSS)
• Faculty of Economics
• Institute of Economic Theory and Methodology
• Faculty of Economics
• Institute of Economic Theory and Methodology
• Faculty of Economics
• Institute of Economic Theory and Methodology

Dealing with heteroskedasticity


• Take the logarithm of 𝑌𝑖 variable (totsc8)
– Log-linear model -> Interpretation of coefficients will change
One-unit change in 𝑿𝟏 is associated One-unit change in 𝑿𝟏 is associated
with [(𝒆𝒃𝟏 −𝟏) ∗ 𝟏𝟎𝟎] %-al
change in 𝒀 on average, if the other ≈ with (𝒃𝟏 ∗ 𝟏𝟎𝟎) %
change in 𝒀 on average, if the other
independent variables remain independent variables remain
constant. constant.

• Use the logarithm of 𝑋𝑖 variable


– Linear-log model -> Interpretation of coefficients will change

One % change in 𝑿𝟏 is associated with One % change in 𝑿𝟏 is associated with


𝒃𝟏
𝒃𝟏 ∗ [𝐥𝐧 𝒙 + ∆𝒙 − 𝐥𝐧(𝒙)] unit
change in 𝒀 on average, if the other ≈ 𝟏𝟎𝟎
unit
change in 𝒀 on average, if the other
independent variables remain independent variables remain
constant. constant.
• Faculty of Economics
• Institute of Economic Theory and Methodology

Transform/Compute Variable…
Variable label: Log of Average district per capita income
• Faculty of Economics
• Institute of Economic Theory and Methodology

Analyze/Regression/Linear

New population equation (5):


𝒕𝒐𝒕𝒔𝒄𝟖
= 𝜷𝟎 + 𝜷𝟏 ∗ 𝒕𝒄𝒉𝒓𝒂𝒕𝒊𝒐 + 𝜷𝟐 ∗ 𝒓𝒆𝒈𝒅𝒂𝒚
+ 𝜷𝟑 ∗ 𝒔𝒑𝒆𝒄𝒆𝒅 + 𝜷𝟒 ∗ 𝐥𝐧(𝒑𝒆𝒓𝒄𝒂𝒑) + 𝜺
• Faculty of Economics
• Institute of Economic Theory and Methodology

Assumptions of Standard Multiple


Linear Regression
• Assumptions of error term:
– 𝐸 𝜀|𝑋 = 𝑥 = 0
Expected value of residuals is 0.
– 𝑉𝑎𝑟 𝜀 = 𝜎 2
Constant variance of residuals, homoskedasticity.
– 𝑋𝑖 , 𝑌𝑖 , 𝑖 = 1, … , 𝑛, 𝑖. 𝑖. 𝑑
The error term is uncorrelated across observations. (autocorrelation)
– 𝜀~𝑁(0, 𝜎 2 )
The residuals follow normal distribution.
• Faculty of Economics
• Institute of Economic Theory and Methodology

𝑋𝑖 , 𝑌𝑖 𝑖 = 1; … ; 𝑛, 𝑖. 𝑖. 𝑑
• The error term is uncorrelated across observations.
• If the observations are independent and identically
distributed, it means you use crossectional data, than
this assumption automatically met.
• In case of time series data, can cause biger problems
• Test:
– Plots
• We plot the residuals against the time or the order of obesrvations
on a scatter plot.
– Durbin-Watson test
• Faculty of Economics
• Institute of Economic Theory and Methodology

• This dataset contains crossectional data, this


assumption mets automatically.
1,625 2,375
• If not: 1,461 1,756 2,539

+ violator - violator
autocorrelation autocorrelation

𝑯𝟎 : 𝝆 = 𝟎 (𝒏𝒐 𝒂𝒖𝒕𝒐𝒄𝒐𝒓𝒓𝒆𝒍𝒂𝒕𝒊𝒐𝒏)
𝑯𝟏 : 𝝆 ≠ 𝟎 (𝟏𝒔𝒕 𝒐𝒓𝒅𝒆𝒓 𝒂𝒖𝒕𝒐𝒄𝒐𝒓𝒓𝒆𝒍𝒂𝒕𝒊𝒐𝒏)
0 dl du 2 4-du 4-dl 4

𝑯𝟎 keep range
• Faculty of Economics
• Institute of Economic Theory and Methodology

Assumptions of Standard Multiple


Linear Regression
• Assumptions of error term:
– 𝐸 𝜀|𝑋 = 𝑥 = 0
Expected value of residuals is 0.
– 𝑉𝑎𝑟 𝜀 = 𝜎 2
Constant variance of residuals, homoskedasticity.
– 𝑋𝑖 , 𝑌𝑖 , 𝑖 = 1, … , 𝑛, 𝑖. 𝑖. 𝑑
The error term is uncorrelated across observations. (autocorrelation)
– 𝜀~𝑁(0, 𝜎 2 )
The residuals follow normal distribution.
• Faculty of Economics
• Institute of Economic Theory and Methodology

2
𝜀~𝑁(0, 𝜎 )
• The residuals follow normal distribution.
• Check:
– Graphs:
• Histogram
• P-P Plot
– Indicators:
• Skewness
• Kurtosis
– Significance test:
• Kolmogorov-Smirnov test
• Shapiro-Wilk test
• Faculty of Economics
• Institute of Economic Theory and Methodology

Visual check
• Faculty of Economics
• Institute of Economic Theory and Methodology

Significance test
Analyze/Regression/Linear
• Faculty of Economics
• Institute of Economic Theory and Methodology

Analyze/Descriptive Statistics/Explore…
• Faculty of Economics
• Institute of Economic Theory and Methodology

𝑺𝒊𝒈 < 𝟓%, we


𝑯𝟎 : 𝒗𝒂𝒓𝒊𝒂𝒃𝒍𝒆 𝒏𝒐𝒓𝒎𝒂𝒍𝒍𝒚 𝒅𝒊𝒔𝒕𝒓𝒊𝒃𝒖𝒕𝒆𝒅 reject 𝑯𝟎 !
𝑯𝟏 : 𝒗𝒂𝒓𝒊𝒂𝒃𝒍𝒆 𝒏𝒐𝒕 𝒏𝒐𝒓𝒎𝒂𝒍𝒍𝒚 𝒅𝒊𝒔𝒕𝒓𝒊𝒃𝒖𝒕𝒆𝒅

It is maybe,
because there are
outliers in dataset.
• Faculty of Economics
• Institute of Economic Theory and Methodology

Detecting outliers
• Graphs
• Mahanalobis distance
– A measure of how much a case's values on the independent variables differ
from the average of all cases. A large Mahalanobis distance identifies a case as
having extreme values on one or more of the independent variables.
• Checking Cook’s distance and leverage value
– Cook’s distance: A measure of how much the residuals of all cases would
change if a particular case were excluded from the calculation of the
regression coefficients. A large Cook's D indicates that excluding a case from
computation of the regression statistics changes the coefficients substantially.
We can detect the cases, which are highly influence the estimated
coefficients.
– Leverage values: Measures the influence of a point on the fit of the
regression. 0 means, no influence on the fit.
• Faculty of Economics
• Institute of Economic Theory and Methodology

Save distances
Analyze/Regression/Linear
• Faculty of Economics
• Institute of Economic Theory and Methodology

Analysing Mahanalobis distance


Transform/Compute Variable…
Prob_MD = 1-CDF.CHISQ(MAH_1,4) 4: Number of
independent
variables (X)

If
𝒑𝒓𝒐𝒃𝑴𝑫 < 𝟎, 𝟎𝟎𝟏
the case is outlier!
• Faculty of Economics
• Institute of Economic Theory and Methodology

• Sort cases ascending by prob_MD


Outliers

BOSTON, CAMBRIDGE, NAHANT


• Faculty of Economics
• Institute of Economic Theory and Methodology

Checking Cook’s distance and leverage value


Graphs/Legacy Dialogs/Scatter/Dot
Simple Scatter
• Faculty of Economics
• Institute of Economic Theory and Methodology

It adds a lot of variability to the It gives a high degree of variability


regression estimates, it likely did
not affect the slope of the to the estimate and greatly
regression equation. influences the fit of the model.

It significantly
influences the fit
of the model,
but the Cook's
distance is small.
It unduly
influences the
model.
• Faculty of Economics
• Institute of Economic Theory and Methodology

Folitering out outliers


Data/Select Cases…
COO_1 < 0.075 & LEV_1 < 0.15
• If condition is satisfied
• Faculty of Economics
• Institute of Economic Theory and Methodology

After filtering out the extreme values, Adj R2


increased, distribution of residuals follow the
normal and homoskedasticity is better a little bit.
• Faculty of Economics
• Institute of Economic Theory and Methodology

Assumptions of Standard Multiple


Linear Regression
• Assumptions of the independent variables (X):
– Linear independency (multicollinearity)
– Fix values, which do not change sample by sample
– There is no scale error
– The independent variable is uncorrelated with the
error term
• Faculty of Economics
• Institute of Economic Theory and Methodology

Multicollinearity
• It is an undesirable situation when one independent
variable (X) is a linear function of other independent
variables (X).
• It is a kind of redundancy
• Check:
– Multiple coefficient of determination (R2)
– F-test
– VIF-indicator
• Fixing:
– Principal component analysis
– Removing variable
• Faculty of Economics
• Institute of Economic Theory and Methodology

VIF-indicator
1
• Formula: 𝑉𝐼𝐹𝑗 =
1 −𝑅𝑗2
• Limits: 1 < 𝑉𝐼𝐹 ≤ ∞
– If 𝑅𝑗2 = 0 → 𝑉𝐼𝐹𝑗 = 1
The jth independent variable doesn’t correlate with the others.
– Ha 𝑅𝑗2 = 1 → 𝑉𝐼𝐹𝑗 = ∞
The jth independent variable is an exact linear combination of
other independent variables.
• Rating:
1 < 𝑉𝐼𝐹 ≤ 2 → 𝑤𝑒𝑎𝑘 𝑚𝑢𝑙𝑡𝑖𝑐𝑜𝑙𝑙𝑖𝑛𝑒𝑎𝑟𝑖𝑡𝑦
2 < 𝑉𝐼𝐹 ≤ 5 → 𝑠𝑡𝑟𝑜𝑛𝑔, 𝑑𝑖𝑠𝑡𝑢𝑟𝑏𝑖𝑛𝑔 𝑚𝑢𝑙𝑡𝑖𝑐𝑜𝑙𝑙𝑖𝑛𝑒𝑎𝑟𝑖𝑡𝑦
5 < 𝑉𝐼𝐹 → 𝑣𝑒𝑟𝑦 𝑠𝑡𝑟𝑜𝑛𝑔, ℎ𝑎𝑟𝑚𝑓𝑢𝑙 𝑚𝑢𝑙𝑡𝑖𝑐𝑜𝑙𝑙𝑖𝑛𝑒𝑎𝑟𝑖𝑡𝑦
• Faculty of Economics
• Institute of Economic Theory and Methodology

Tolerance: The tolerance is the


percentage of the variance in a The VIF-indicators are under 2
given predictor that cannot be for every variable,
explained by the other
predictors. which indicates
weak multicollinearity.
• Faculty of Economics
• Institute of Economic Theory and Methodology

Special regression variables


• Polynomial
• Interactions
• Log of variables
• Binary variables (Dummy)
• Faculty of Economics
• Institute of Economic Theory and Methodology

Special regression variables


• Polynomial
– We represent the power of the 𝑋𝑝 in regression
equation.

𝒀 = 𝜷𝟎 + 𝜷𝟏 𝑿𝟏 + 𝜷𝟐 𝑿𝟐𝟏 + ⋯ + 𝜷𝒑 𝑿𝒑 + 𝜺

– Disadvantage: complicated interpretation of


coefficients
• Faculty of Economics
• Institute of Economic Theory and Methodology

Special regression variables


• Interactions
– We can use interactions, if we would represent an
effect more dominant in your model.

𝒀 = 𝜷𝟎 + 𝜷𝟏 𝑿𝟏 + 𝜷𝟐 𝑿𝟐 + 𝜷𝟑 (𝑿𝟏 ∗ 𝑿𝟐 ) + ⋯ + 𝜺

– The effect on 𝒀 of a change in 𝑿𝟏 is depends on 𝑿𝟐


∆𝒀
= 𝜷𝟏 + 𝜷𝟑 ∗ 𝑿𝟐
∆𝑿𝟏
• Faculty of Economics
• Institute of Economic Theory and Methodology

Special regression variables


• Log of variables (Linear-log model)
– We use the natural (ln) or 10 base (log) logarithm of
𝑋𝑝 variable instead of original.

𝒀 = 𝜷𝟎 + 𝜷𝟏 𝐥𝐧(𝑿𝟏 ) + ⋯ + 𝜷𝒑 𝑿𝒑 + 𝜺

𝜷𝟏
– 𝜷𝟏 : 1% change in 𝑿𝟏 is associated with ≅ unit
𝟏𝟎𝟎
change in 𝒀, if the other independent variables
remain constant (ceteris paribus).
• Faculty of Economics
• Institute of Economic Theory and Methodology

Special regression variables


• Binary variables (Dummy)
– We can apply limited number of nominal or ordinal
scale variables (just few categories) in our model.

𝒀 = 𝜷𝟎 + 𝜷𝟏 𝑫𝟏 + 𝜷𝟐 𝑿𝟐 + ⋯ 𝜷𝒑 𝑿𝒑 + 𝜺

– 𝜷𝟏 : If the condition if met (𝑫𝟏 = 1), than how much


higher (or lower) is 𝒀 compared to 𝑫𝟏 = 0 situation, if
the other independent variables (X) remain constant
(ceteris paribus).
• Faculty of Economics
• Institute of Economic Theory and Methodology

Dummy variable
Transform/Recode into Different Variables…
Share of receiving lunch subsidy (%):
0 → Less than 20%
1 → Higher than 20%
• Faculty of Economics
• Institute of Economic Theory and Methodology

Analyze/Regression/Linear
New population equation (6):
𝒕𝒐𝒕𝒔𝒄𝟖 = 𝜷𝟎 + 𝜷𝟏 ∗ 𝒕𝒄𝒉𝒓𝒂𝒕𝒊𝒐 + 𝜷𝟐 ∗ 𝒓𝒆𝒈𝒅𝒂𝒚 + 𝜷𝟑 ∗ 𝒔𝒑𝒆𝒄𝒆𝒅 + 𝜷𝟒 ∗ 𝐥𝐧 𝒑𝒆𝒓𝒄𝒂𝒑 + 𝜷𝟓 ∗ 𝒍𝒏𝒄𝒉𝟐𝟎 + 𝜺
• Faculty of Economics
• Institute of Economic Theory and Methodology

Adj. R2: Compared with previus


specification, it is increased. It
means that the new variable
contributed to the explanatory
power of the model.

𝑺𝒊𝒈 < 𝟓%, VIF-indicators increased


Effect of Student per All variables have for some variables, but
teacher decreased! significant effect. still acceptable.

The effect of dummy


variable is significant.
• Faculty of Economics
• Institute of Economic Theory and Methodology

5. Validation

I. Save predicted values (𝑌)
II. Activate test data
III. Compare the observed (𝑌) and estimated
෠ values
(predicted) (𝑌)
• Faculty of Economics
• Institute of Economic Theory and Methodology


I. Save predicted values (𝑌)
Analyze/Regression/Linear
• Faculty of Economics
• Institute of Economic Theory and Methodology

II. Activate test data


Data/Split File
• Faculty of Economics
• Institute of Economic Theory and Methodology

III. Compare the observed (𝑌) and


෠ values
estimated (predicted) (𝑌)
Graphs/Legacy Dialogs/Scatter/Dot
Simple scatter
• Faculty of Economics
• Institute of Economic Theory and Methodology
6. Conclusion
Y 8th grade score (math+english+science)
X (1) (2) (3) (4) (5) (6) (7)
-3,017 -2,340 -2,237 -2,342 -2,237 -2,219 -1,993
Students per teacher
(0,678) (0,626) (0,571) (0,542) (0,476) (0,514) (0,478)

Spending per pupil, regular -0,007 -0,006 -0,009 -0,008 -0,009 -0,007
(in th $) (0,006) (0,005) (0,002) (0,001) (0,001) (0,001)

Spending per pupil, total -0,002 -0,003


(in th $) (0,005) (0,004)
Average teacher salary 0,132
(in th $) (0,432)
Share of special education -0,764 -0,718 -0,772 -0,744 -0,644 -0,627
students (%) (0,333) (0,318) (0,304) (0,267) (0,257) (0,237)
Average district per capita 3,236 3,132 3,145
income (in th $) (0,263) (0,223) (0,221)
Log of Average district per 69,508 71,677 59,550
capita income (4,038) (3,824) (4,436)

Share of receiving lunch -10,627


subsidy higher than 20% (2,347)

750,518 729,405 732,408 734,645 587,741 581,581 605,540


Constant
(11,803) (17,584) (14,93) (14,405) (15,706) (15,891) (15,618)

Outlier Not filtered Not filtered Not filtered Not filtered Not filtered Filtered Filtered

Observations 180 113 124 124 124 120 120

R 0,316 0,842 0,833 0,833 0,873 0,887 0,905134


Adjusted R2 9,508% 69,237% 68,110% 68,284% 75,458% 77,936% 81,134%

F 19,808 43,013 53,540 67,204 95,545 106,085 103,353


Sig. 0,000 0,000 0,000 0,000 0,000 0,000 0,00
• Gazdaságtudományi Kar
• Gazdaságelméleti és Módszertani Intézet

Estimated model (7):


෣ = 𝟔𝟎𝟓, 𝟓𝟒 − 𝟏, 𝟗𝟗 ∗ 𝒕𝒄𝒉𝒓𝒂𝒕𝒊𝒐 − 𝟎, 𝟎𝟎𝟕 ∗ 𝒓𝒆𝒈𝒅𝒂𝒚 − 𝟎, 𝟔𝟑 ∗ 𝒔𝒑𝒆𝒄𝒆𝒅 + 𝟓𝟗, 𝟓𝟓 ∗ 𝐥𝐧 𝒑𝒆𝒓𝒄𝒂𝒑 − 𝟏𝟎, 𝟔𝟑 ∗ 𝒍𝒏𝒄𝒉𝟐𝟎
𝒕𝒐𝒕𝒔𝒄𝟖

𝟔𝟎𝟓, 𝟓𝟒: If every independent variable’s value 𝑋𝑝 would be 0, than the 8th grade score (𝑌)
would be 605,54 points on average.
−𝟏, 𝟗𝟗: If Student per teacher ratio 𝑋1 is higher by 1 person, than the 8th grade score (𝑌) is
1,99 points lower on average, if every independent variable remain constant(ceteris paribus).
−𝟎, 𝟎𝟎𝟕: If Spending per pupil, regular 𝑋2 is higher by 1 thousand $, than the 8th grade score
(𝑌) is 0,007 points lower on average, if every independent variable remain constant (ceteris
paribus).
−𝟎, 𝟔𝟑: If the Share of special education students 𝑋3 is 1 percentage point higher, than the 8th
grade score (𝑌) is 0,63 points lower on average, if every independent variable remain constant
(ceteris paribus).
𝟓𝟗, 𝟓𝟓: If the Average district per capita income 𝑋4 is 1% higher, than the 8th grade score (𝑌) is
≈0,5955 points higher on average, if every independent variable remain constant (ceteris
paribus).
−𝟏𝟎, 𝟔𝟑: In schools, where the Share of receiving lunch subsidy higher than 20% 𝑋5 = 1 , the
8th grade score (𝑌) is 10,63 points lower on average, if every independent variable remain
constant (ceteris paribus).
• Gazdaságtudományi Kar
• Gazdaságelméleti és Módszertani Intézet

Assumptions of error term Assumptions of the independent variables


Automatically met
VIF indicators around
•𝐸 𝜀|𝑋 = 𝑥 = 0 because of OLS •Multicollinearity or less than 2
estimation
Fixed, by using Log of
•𝑉𝑎𝑟 𝜀 = 𝜎2 Average district per •Fix values Met
capita income

• 𝑋𝑖 , 𝑌𝑖 i. i. d Crossectional data •No scale error Didn’t contains

Fixed, with filtering out •Uncorrelated


•𝜀~𝑁(0, 𝜎 2 ) outliers
Met
with error term
• Faculty of Economics
• Institute of Economic Theory and Methodology

Thank you for your attention!


[email protected]

You might also like