0% found this document useful (0 votes)
46 views18 pages

Simple Regression

This document describes the method of ordinary least squares (OLS) estimation. It explains that OLS chooses estimates that minimize the sum of squared residuals between the actual and predicted y-values. This leads to a set of normal equations that can be solved simultaneously to obtain the OLS estimates of the model parameters. Key properties of OLS are that it produces unbiased, efficient, and consistent estimates when the assumptions of the classical linear regression model are satisfied.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views18 pages

Simple Regression

This document describes the method of ordinary least squares (OLS) estimation. It explains that OLS chooses estimates that minimize the sum of squared residuals between the actual and predicted y-values. This leads to a set of normal equations that can be solved simultaneously to obtain the OLS estimates of the model parameters. Key properties of OLS are that it produces unbiased, efficient, and consistent estimates when the assumptions of the classical linear regression model are satisfied.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 18

THE METHOD OF ORDINARY LEAST SQUARES

• To understand this method, we first explain the least squares principle.


• Recall the two-variable PRF:
Yi = β1 + β2Xi + ui (2.4.2)
• The PRF is not directly observable. We estimate it from the SRF:
Yi = βˆ1 + βˆ2Xi +uˆi (2.6.2)
= Yˆi +uˆi (2.6.3)
• where Yˆi is the estimated (conditional mean) value of Yi .
• But how is the SRF itself determined? First, express (2.6.3) as
uˆi = Yi − Yˆi
= Yi − βˆ1 − βˆ2Xi (3.1.1)
• Now given n pairs of observations on Y and X, we would like to determine the
SRF in such a manner that it is as close as possible to the actual Y. To this
end, we may adopt the following criterion:
• Choose the SRF in such a way that the sum of the residuals ˆ ui = (Yi − Yˆi) is
as small as possible.
• But this is not a very good criterion. If we adopt the criterion of
minimizing ˆui , Figure 3.1 shows that the residuals ˆu2 and ˆu3 as well as
the residuals ˆu1 and ˆu4 receive the same weight in the sum (ˆu1 + ˆu2 + ˆu3
+ ˆu4). A consequence of this is that it is quite possible that the algebraic
sum of the ˆui is small (even zero) although the ˆui are widely scattered
about the SRF.
• To see this, let ˆu1, ˆu2, ˆu3, and ˆu4 in Figure 3.1 take the values of 10, −2,
+2, and −10, respectively. The algebraic sum of these residuals is zero
although ˆu1 and ˆu4 are scattered more widely around the SRF than ˆu2
and ˆu3.
• We can avoid this problem if we adopt the least-squares criterion, which
states that the SRF can be fixed in such a way that
ˆu2i = (Yi − Yˆi)2
= (Yi − βˆ1 − βˆ2Xi)2 (3.1.2)
• is as small as possible, where ˆu2i are the squared residuals.
ˆu2i = (Yi − βˆ1 − βˆ2Xi)2 (3.1.2)
• The process of differentiation yields the following equations for estimating
β1 and β2:
Yi Xi = βˆ1Xi + βˆ2X2i (3.1.4)
Yi = nβˆ1 + βˆ2Xi (3.1.5)
• where n is the sample size. These simultaneous equations are known as the
normal equations. Solving the normal equations simultaneously, we obtain:
• where X¯ and Y¯ are the sample means of X and Y and where we
define xi = (Xi − X¯ ) and yi = (Yi − Y¯). Henceforth we adopt the
convention of letting the lowercase letters denote deviations from mean
values.
• The last step in (3.1.7) can be obtained directly from (3.1.4) by simple
algebraic manipulations. Incidentally, note that, by making use of simple
algebraic identities, formula (3.1.6) for estimating β2 can be alternatively
expressed as:

• The estimators obtained previously are known as the least-squares


estimators.
• The regression line (Figure 3.2) thus obtained has the following properties:
– 1. It passes through the sample means of Y and X. This fact is obvious
from (3.1.7), for the latter can be written as Y¯ = βˆ1 + βˆ2X¯ , which is
shown diagrammatically in Figure 3.2.
– 2. The mean value of the estimated Y = Yˆi is equal to the mean value of
the actual Y for:
Yˆi = βˆ1 + βˆ2Xi
= (Y¯ − βˆ2X¯ ) + βˆ2Xi
= Y¯ + βˆ2(Xi − X¯) (3.1.9)
• Summing both sides of this last equality over the sample values and
dividing through by the sample size n gives
Y¯ˆ = Y¯ (3.1.10)
• where use is made of the fact that (Xi − X¯ ) = 0.
– 3. The mean value of the residuals ¯ ˆui is zero.
– 4. The residuals ˆui are uncorrelated with the predicted Yi . This statement
can be verified as follows: using the deviation form, we can write:

– 5. The residuals ˆui are uncorrelated with Xi


• As shown in Figure 3.3, each Y population corresponding to a given X
is distributed around its mean value with some Y values above the
mean and some below it. the mean value of these deviations
corresponding to any given X should be zero.
• Note that the assumption E(ui | Xi) = 0 implies that E(Yi | Xi) = β1 +
β2Xi.
• Technically, (3.2.2) represents the assumption of homoscedasticity, or equal
(homo) spread (scedasticity) or equal variance. Stated differently, (3.2.2)
means that the Y populations corresponding to various X values have the
same variance.
• Put simply, the variation around the regression line (which is the line of
average relationship between Y and X) is the same across the X values; it
neither increases or decreases as X varies
• The disturbances ui and uj are uncorrelated, i.e., no serial correlation. This
means that, given Xi , the deviations of any two Y values from their mean
value do not exhibit patterns. In Figure 3.6a, the u’s are positively correlated,
a positive u followed by a positive u or a negative u followed by a negative u.
In Figure 3.6b, the u’s are negatively correlated, a positive u followed by a
negative u and vice versa. If the disturbances follow systematic patterns,
Figure 3.6a and b, there is auto- or serial correlation. Figure 3.6c shows that
there is no systematic pattern to the u’s, thus indicating zero correlation.
• The disturbance u and explanatory variable X are uncorrelated. The PRF
assumes that X and u (which may represent the influence of all the omitted
variables) have separate (and additive) influence on Y. But if X and u are
correlated, it is not possible to assess their individual effects on Y. Thus, if X
and u are positively correlated, X increases when u increases and it decreases
when u decreases. Similarly, if X and u are negatively correlated, X increases
when u decreases and it decreases when u increases. In either case, it is
difficult to isolate the influence of X and u on Y.
• In the hypothetical example of Table 3.1, imagine that we had only the first
pair of observations on Y and X (4 and 1). From this single observation there
is no way to estimate the two unknowns, β1 and β2. We need at least two pairs
of observations to estimate the two unknowns
• This assumption too is not so innocuous as it looks. Look at Eq. (3.1.6). If
all the X values are identical, then Xi = X¯ and the denominator of that
equation will be zero, making it impossible to estimate β2 and therefore β1.
Looking at our family consumption expenditure example in Chapter 2, if
there is very little variation in family income, we will not be able to explain
much of the variation in the consumption expenditure.
• An econometric investigation begins with the specification of the
econometric model underlying the phenomenon of interest. Some important
questions that arise in the specification of the model include the following:
(1) What variables should be included in the model?
• (2) What is the functional form of the model? Is it linear in the parameters,
the variables, or both?
• (3) What are the probabilistic assumptions made about the Yi , the Xi, and
the ui entering the model?

You might also like