0% found this document useful (0 votes)
9 views55 pages

Econometrics II: Revision Class: Introduction To Econometrics

Uploaded by

Ibrahim
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views55 pages

Econometrics II: Revision Class: Introduction To Econometrics

Uploaded by

Ibrahim
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 55

Econometrics II

Revision Class: Introduction


to Econometrics
Dr. Rabia Ikram
Assistant Professor, Lahore School of Economics

Lahore School of Economics

Winter 2024
What is Econometrics?
Econometrics - use of economic theory and
statistical methods to analyze economic data

Typical goals of econometric analysis:


– Estimating relationships between economic
variables
– Testing economic theories and hypotheses
– Forecasting economic variables, such as firm‘s
sales, the overall growth of the economy, or
stock prices.
– Evaluating and implementing government and
business policy
2
What is Econometrics?
Empirical analysis-uses data to test a theory
or to estimate a relationship

Steps in empirical analysis:


1. Economic model (this step is often skipped)
2. Econometric model

How can we estimate a population


parameter from a sample?
3
What makes one estimator
better than another?
1) Unbiased

 The estimated coefficients may be smaller or larger,

depending on the sample that is the result of a random

draw.

 However, on average, they will be equal to the values

that characterize the true relationship between Y and X

in the population.
4
What makes one estimator
better than another?
2) Efficient
 Depending on the sample, the estimates will be nearer or
farther away from the true population values
 How far can we expect our estimates to be away from the true
population values on average (=sampling variability) ?

In the diagram, A and B are


both unbiased estimators but
B is superior because it is
more efficient.

5
What makes one estimator
better than another?
3) Consistent

As the sample size increases to infinity, in the limit,


the variance of the distribution tends to zero. The
distribution collapses to a spike at the true value.
The plim of the sample mean is therefore the
population mean.

6
The Nature of Econometrics
and Economic Data
Econometric analysis requires data

Different kinds of economic data sets


– Cross-sectional data
– Time series data
– Pooled cross sections
– Panel/Longitudinal data

Econometric methods depend on the nature of the


data used
– Use of inappropriate methods may lead to
misleading results 7
The Simple Linear Regression Model

Definition of the simple linear regression model

“Explains variable in terms of variable


” Intercept Slope parameter

Error term, disturbance,


unobservables.
It represents factors other than x
that affect y.
1. Dependent variable 1. Independent variable
2. Explained variable 2. Explanatory variable
3. Response variable 3. Control Variable
4. Predicted variable 4. Predictor Variable
5. Regressand 5. Regressor

8
The Simple Linear Regression Model

Interpretation of the simple linear regression model


“Studies how varies with changes in :”

∆𝒚 ∆𝒖
𝜷 𝟏= as long as =𝟎
∆𝒙 ∆ 𝒙

By how much does the dependent Interpretation only correct if all other
variable change if the independent things remain equal when the
variable is increased by one unit? independent variable is increased by
one unit

9
When is there a causal
interpretation?
Conditional mean independence assumption
 We need to make a crucial assumption about how u and
x are related
 We want it to be the case that knowing something
about x does not give us any information about u, so
that they are completely unrelated. That is, that

 The explanatory variable must not contain information


about the mean of the unobserved factors.
 The average value of the error term does not depend
on the value of x. 10
For any given value of x, the distribution of y is centered
about E(Y|X).

Population regression function

𝐸 ( 𝑐𝑜𝑙𝐺𝑃𝐴|h𝑠𝐺𝑃𝐴 )=2+0.5 h𝑠𝐺𝑃𝐴

For individuals with , the


average value of is

11
Ordinary Least Squares
(OLS)
Intuitively, OLS is fitting a line through the
sample points such that the sum of squared
residuals is as small as possible, hence the
term least squares.

– The residual, û, is an estimate of the error term,


u, and is the difference between the fitted line
(sample regression function) and the sample point
– Why do we square the residuals?

12
Sample regression line, sample data points and the
associated estimated error terms
Fit as good as possible a regression line through the
data points: Fitted regression line or the
sample regression function (SRF)
because it is the estimated
version of PRF.
SRF is obtained for a given
sample of data, a new sample
will generate a different slope
and intercept.

13
The Simple Linear Regression Model

• What does “as good as possible” mean?


• Regression residuals

• Minimize sum of squared regression


residuals
 OLS estimators used to
produce OLS estimates.

• Ordinary Least Squares (OLS) estimates

14
Algebraic Properties of
OLS
• Properties of OLS on any sample of data
• Fitted values and residuals

Fitted or predicted values Deviations from regression line (= OLS residuals)


• Algebraic properties of OLS regression

Deviations from regression Covariance between deviations Sample averages of y and x


line sum up to zero and regressors is zero lie on regression line

15
Deriving OLS Estimates
for SLR
𝒀 𝒊= 𝜷𝟎 + 𝜷 𝟏 𝑿 𝒊+ 𝝁𝒊
Population Error term
Regression
Function (PRF)

Estimated ^𝒊 =𝒀 𝒊 − ^
𝝁 𝜷𝟎 − ^
𝜷𝟏 𝑿𝒊
residuals:

Objective
𝟐 ^ ^
𝑹𝑺𝑺=∑ 𝝁 =∑ (𝒀 𝒊 − 𝜷 𝟎 − 𝜷𝟏 𝑿 𝒊 )𝟐
function: 𝒊

16
Deriving OLS Estimates for SLR
Minimizing Objective function:
^ ^
𝑹𝑺𝑺=∑ 𝝁 =∑ (𝒀 𝒊 − 𝜷 𝟎 − 𝜷𝟏 𝑿 𝒊 )
𝟐 𝟐
𝒊
First Order Conditions (FOCs):
i.

ii.

17
Summary of OLS slope
estimate

Provided that

• The slope estimate is the sample covariance


between x and y divided by the sample variance of
x
• If x and y are positively correlated, the slope will
be positive
• If x and y are negatively correlated, the slope will
be negative
• Only need x to vary in our sample
18
Example- Simple Linear
Regression (SLR)

Where:
– Wage= hourly wage in dollars
– Edu= years of education

Interpret edu.

19
Multiple Linear Regression Analysis

– b0 is still the intercept


– b1 to bk all called slope parameters
– u is still the error term (or disturbance)
– Still need to make a zero conditional mean
assumption, so now assume that
– E(u|x1,x2, …,xk) = 0
– Still minimizing the sum of squared residuals,
so have k+1 first order conditions

20
Interpreting Multiple
Regression
yˆ ˆ  ˆ x  ˆ x  ...  ˆ x , so
0 1 1 2 2 k k

yˆ ˆ1 x1  ˆ 2 x2  ...  ˆ k xk ,


so holding x2 ,..., xk fixed implies that
yˆ ˆ x , that is each  has
1 1

a ceteris pa ribus interpreta tion


21
“Partialling Out”
Interpretation
The regression coefficient of each X variable
provides an estimate of its influence on Y,
controlling for the effects of all the other X
variables.

One can show that the estimated coefficient of an


explanatory variable in a multiple regression can be
obtained in two steps:
1) Regress the explanatory variable on all other
explanatory variables
2) Regress y on the residuals from this regression
22
“Partialling Out”
Interpretation

Thus, measures the sample relationship


between y and x1 after x2 has been
partialled out.

23
“Partialling Out”
Interpretation
Why does this procedure work?
– The residuals from the first regression is the part of the
explanatory variable that is uncorrelated with the other
explanatory variables
– The slope coefficient of the second regression therefore
represents the isolated effect of the explanatory variable on the
dep. Variable

– This implies that regressing y on x1 and x2 gives same effect of x1

as regressing y on residuals from a regression of x1 on x2.

– This means only the part of xi1 that is uncorrelated with xi2 are

being related to yi so we’re estimating the effect of x1 on y after x2


has been “partialled out” 24
Example-Multiple Linear
Regression (MLR)

Where;
– colGPA= college grade point average
– hsGPA= high school grade point average
– ACT= achievement test score

Interpret hsGPA.

25
Example-Multiple Linear
Regression (MLR)

Where;
– colGPA= college grade point average
– hsGPA= high school grade point average
– ACT= achievement test score
Interpretation
• Holding ACT fixed, another point on high school grade point
average is associated with an increase of .453 points in the
college grade point average
• Or: If we compare two students with the same ACT, but the hsGPA
of student A is one point higher, we predict student A to have a
colGPA that is .453 higher than that of student B
• Holding high school grade point average fixed, another 10 points
on ACT are associated with less than one point on college GPA 26
Assumptions for
Unbiasedness
i. Population model is linear in parameters:
y = b0 + b1x1 + b2x2 +…+ bkxk +
u

ii. We can use a random sample of size n, {(xi1,


xi2,…, xik, yi): i=1, 2, …, n}, from the population
model, so that the sample model is:
yi = b0 + b1xi1 + b2xi2 +…+ bkxik
+ ui

iii. E(u|x1, x2,… xk) = 0, implying that all of the


explanatory variables are exogenous
27
iv. None of the x’s is constant, and there are no
The Gauss-Markov
Theorem

28
Too Many or Too Few
Variables
What happens if we include variables
in our specification that don’t belong?

29
Too Many or Too Few
Variables
What happens if we include variables
in our specification that don’t belong?
– There is no effect on our parameter
estimate, and OLS remains unbiased

30
Too Many or Too Few
Variables
What if we exclude a variable from our
specification that does belong?

31
Too Many or Too Few
Variables
What if we exclude a variable from our
specification that does belong?
– OLS will usually be biased

32
Omitted Variable Bias
Omitting relevant variables: the simple
case
𝒚 =𝜷 𝟎+ 𝜷𝟏 𝒙 𝟏 + 𝜷 𝟐 𝒙 𝟐+𝒖 True model (contains x 1
and x2)

Estimated model (x2 is


omitted)

33
Omitted Variable Bias
If x1 and x2 are correlated, assume a linear
regression relationship between them

If y is only regressed If y is only regressed error term


on x1 this will be the on x1, this will be the
estimated intercept estimated slope on x1

As,
Omitted Vairable Bias

34
Conclusion: All estimated coefficients will be biased
Summary of Direction of
Bias

35
Omitted Variable Bias
Summary

36
Analyzing the Variance of Y
Goodness-of-Fit
“How well does the explanatory variable explain the dependent variable?”

Measures of Variation

Explained sum of squares, Residual sum of squares,


represents variation represents variation not
explained by regression explained by regression

37
Goodness-of-Fit
Decomposition of total variation in y

Total variation in Y Explained part Unexplained part


Goodness-of-fit measure (R-squared)

38
Hypothesis Testing
• Statistical inference “... draws
conclusions from (or makes inferences
about) a population from a random
sample taken from that population.”

• A population is the ‘universe’ or the total


number of observations.

• A sample is a subset of a given


population.
39
Testing a Hypothesis Relating to
a Regression Coefficient

40
Hypothesis Testing

41
Testing

42
Testing when j is ‘statistically
significant’

43
Significance of a Single Coefficient vs
Overall Significance of the Regression

44
Testing the Overall Significance for
Multivariate Linear Regression: F test

45
46
The F Statistic for Overall
Significance of a Regression

Decision Rule:
F-stat > F critical, reject null, which suggests restricted
independent variables jointly add explanatory value to
our model and help predict the variations in the
dependent variable. 47
Multiple Regression Analysis
with Qualitative Information-
Dummy Variables
• Qualitative Information
– Examples: gender, race, industry, region,
rating grade, …
– A way to incorporate qualitative information is
to use dummy variables
– They may appear as the dependent or as
independent variables
• A single dummy independent variable

= the wage gain/loss if the person is a woman rather than a Dummy variable:
man (holding other things fixed) =1 if the person is a woman
=0 if the person is a man
Difference in intercepts between females and males.

48
Multiple Regression Analysis
with Qualitative Information-
Dummy Variables
Graphical Illustration

Alternative interpretation of coefficient:

i.e. the difference in mean wage between men


and women with the same level of education.

Intercept shift

49
Multiple Regression Analysis with
Qualitative Information- Dummy Variables

• Comparing means of subpopulations described by dummies


Not holding other factors constant, women earn
$2.51per hour less than men, i.e. the difference
between the mean wage of men and that of
women is $2.51.

Average wage of men in the sample:

The difference in the average wage between


women and men:

Average wage of women in the sample:


• Discussion
– It can easily be tested whether difference in means is significant
– The wage difference between men and women is larger if no
other things are controlled for; i.e. part of the difference is due to
differ-ences in education, experience, and tenure between men
and women

50
Summary of Functional Forms
Involving Logarithms

51
Interpretation of Functional
Forms Involving Logarithms
Example: GDP per capita and Labor
Productivity in Agriculture
Level-Level Model

Labor Productivity in

Interpretation:
Agriculture ($) GDP per capita ($)

52
Interpretation of Functional
Forms Involving Logarithms
Example: GDP per capita and Labor
Productivity in Agriculture
Log-Level Model

Log of Labor Productivity in


Agriculture ($) GDP per capita ($)

Interpretation:

53
Interpretation of Functional
Forms Involving Logarithms
Example: GDP per capita and Labor
Productivity in Agriculture
Level-Log Model

Labor Productivity in Log of GDP per capita ($)

Interpretation:
Agriculture ($)

54
Interpretation of Functional
Forms Involving Logarithms
Example: GDP per capita and Labor
Productivity in Agriculture
Log-Log Model

Log of Labor Productivity in Log of GDP per capita ($)

Interpretation:
Agriculture ($)

55

You might also like