This document describes the method of ordinary least squares (OLS) estimation. It explains that OLS chooses estimates that minimize the sum of squared residuals between the actual and predicted y-values. This leads to a set of normal equations that can be solved simultaneously to obtain the OLS estimates of the model parameters. Key properties of OLS are that it produces unbiased, efficient, and consistent estimates when the assumptions of the classical linear regression model are satisfied.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
46 views18 pages
Simple Regression
This document describes the method of ordinary least squares (OLS) estimation. It explains that OLS chooses estimates that minimize the sum of squared residuals between the actual and predicted y-values. This leads to a set of normal equations that can be solved simultaneously to obtain the OLS estimates of the model parameters. Key properties of OLS are that it produces unbiased, efficient, and consistent estimates when the assumptions of the classical linear regression model are satisfied.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 18
THE METHOD OF ORDINARY LEAST SQUARES
• To understand this method, we first explain the least squares principle.
• Recall the two-variable PRF: Yi = β1 + β2Xi + ui (2.4.2) • The PRF is not directly observable. We estimate it from the SRF: Yi = βˆ1 + βˆ2Xi +uˆi (2.6.2) = Yˆi +uˆi (2.6.3) • where Yˆi is the estimated (conditional mean) value of Yi . • But how is the SRF itself determined? First, express (2.6.3) as uˆi = Yi − Yˆi = Yi − βˆ1 − βˆ2Xi (3.1.1) • Now given n pairs of observations on Y and X, we would like to determine the SRF in such a manner that it is as close as possible to the actual Y. To this end, we may adopt the following criterion: • Choose the SRF in such a way that the sum of the residuals ˆ ui = (Yi − Yˆi) is as small as possible. • But this is not a very good criterion. If we adopt the criterion of minimizing ˆui , Figure 3.1 shows that the residuals ˆu2 and ˆu3 as well as the residuals ˆu1 and ˆu4 receive the same weight in the sum (ˆu1 + ˆu2 + ˆu3 + ˆu4). A consequence of this is that it is quite possible that the algebraic sum of the ˆui is small (even zero) although the ˆui are widely scattered about the SRF. • To see this, let ˆu1, ˆu2, ˆu3, and ˆu4 in Figure 3.1 take the values of 10, −2, +2, and −10, respectively. The algebraic sum of these residuals is zero although ˆu1 and ˆu4 are scattered more widely around the SRF than ˆu2 and ˆu3. • We can avoid this problem if we adopt the least-squares criterion, which states that the SRF can be fixed in such a way that ˆu2i = (Yi − Yˆi)2 = (Yi − βˆ1 − βˆ2Xi)2 (3.1.2) • is as small as possible, where ˆu2i are the squared residuals. ˆu2i = (Yi − βˆ1 − βˆ2Xi)2 (3.1.2) • The process of differentiation yields the following equations for estimating β1 and β2: Yi Xi = βˆ1Xi + βˆ2X2i (3.1.4) Yi = nβˆ1 + βˆ2Xi (3.1.5) • where n is the sample size. These simultaneous equations are known as the normal equations. Solving the normal equations simultaneously, we obtain: • where X¯ and Y¯ are the sample means of X and Y and where we define xi = (Xi − X¯ ) and yi = (Yi − Y¯). Henceforth we adopt the convention of letting the lowercase letters denote deviations from mean values. • The last step in (3.1.7) can be obtained directly from (3.1.4) by simple algebraic manipulations. Incidentally, note that, by making use of simple algebraic identities, formula (3.1.6) for estimating β2 can be alternatively expressed as:
• The estimators obtained previously are known as the least-squares
estimators. • The regression line (Figure 3.2) thus obtained has the following properties: – 1. It passes through the sample means of Y and X. This fact is obvious from (3.1.7), for the latter can be written as Y¯ = βˆ1 + βˆ2X¯ , which is shown diagrammatically in Figure 3.2. – 2. The mean value of the estimated Y = Yˆi is equal to the mean value of the actual Y for: Yˆi = βˆ1 + βˆ2Xi = (Y¯ − βˆ2X¯ ) + βˆ2Xi = Y¯ + βˆ2(Xi − X¯) (3.1.9) • Summing both sides of this last equality over the sample values and dividing through by the sample size n gives Y¯ˆ = Y¯ (3.1.10) • where use is made of the fact that (Xi − X¯ ) = 0. – 3. The mean value of the residuals ¯ ˆui is zero. – 4. The residuals ˆui are uncorrelated with the predicted Yi . This statement can be verified as follows: using the deviation form, we can write:
– 5. The residuals ˆui are uncorrelated with Xi
• As shown in Figure 3.3, each Y population corresponding to a given X is distributed around its mean value with some Y values above the mean and some below it. the mean value of these deviations corresponding to any given X should be zero. • Note that the assumption E(ui | Xi) = 0 implies that E(Yi | Xi) = β1 + β2Xi. • Technically, (3.2.2) represents the assumption of homoscedasticity, or equal (homo) spread (scedasticity) or equal variance. Stated differently, (3.2.2) means that the Y populations corresponding to various X values have the same variance. • Put simply, the variation around the regression line (which is the line of average relationship between Y and X) is the same across the X values; it neither increases or decreases as X varies • The disturbances ui and uj are uncorrelated, i.e., no serial correlation. This means that, given Xi , the deviations of any two Y values from their mean value do not exhibit patterns. In Figure 3.6a, the u’s are positively correlated, a positive u followed by a positive u or a negative u followed by a negative u. In Figure 3.6b, the u’s are negatively correlated, a positive u followed by a negative u and vice versa. If the disturbances follow systematic patterns, Figure 3.6a and b, there is auto- or serial correlation. Figure 3.6c shows that there is no systematic pattern to the u’s, thus indicating zero correlation. • The disturbance u and explanatory variable X are uncorrelated. The PRF assumes that X and u (which may represent the influence of all the omitted variables) have separate (and additive) influence on Y. But if X and u are correlated, it is not possible to assess their individual effects on Y. Thus, if X and u are positively correlated, X increases when u increases and it decreases when u decreases. Similarly, if X and u are negatively correlated, X increases when u decreases and it decreases when u increases. In either case, it is difficult to isolate the influence of X and u on Y. • In the hypothetical example of Table 3.1, imagine that we had only the first pair of observations on Y and X (4 and 1). From this single observation there is no way to estimate the two unknowns, β1 and β2. We need at least two pairs of observations to estimate the two unknowns • This assumption too is not so innocuous as it looks. Look at Eq. (3.1.6). If all the X values are identical, then Xi = X¯ and the denominator of that equation will be zero, making it impossible to estimate β2 and therefore β1. Looking at our family consumption expenditure example in Chapter 2, if there is very little variation in family income, we will not be able to explain much of the variation in the consumption expenditure. • An econometric investigation begins with the specification of the econometric model underlying the phenomenon of interest. Some important questions that arise in the specification of the model include the following: (1) What variables should be included in the model? • (2) What is the functional form of the model? Is it linear in the parameters, the variables, or both? • (3) What are the probabilistic assumptions made about the Yi , the Xi, and the ui entering the model?