Chapter Two discusses regression analysis, focusing on the differences between sample and population regression functions, the nature of error terms, and the assumptions underlying linear regression models. It explains the estimation of parameters using methods like Ordinary Least Squares and introduces concepts such as covariance, correlation coefficients, and the coefficient of determination (R²). Additionally, it covers hypothesis testing related to regression coefficients and the significance of explanatory variables.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
21 views58 pages
Econometrics For MGT ppt-2
Chapter Two discusses regression analysis, focusing on the differences between sample and population regression functions, the nature of error terms, and the assumptions underlying linear regression models. It explains the estimation of parameters using methods like Ordinary Least Squares and introduces concepts such as covariance, correlation coefficients, and the coefficient of determination (R²). Additionally, it covers hypothesis testing related to regression coefficients and the significance of explanatory variables.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 58
Chapter Two: Regression analysis
What is the difference between the sample regression
function and the population regression function? Regression analysis is used to model the relationship between a response variable and one or more predictor variables. Estimation is a key factor used in regression analysis. The population regression function is a description of the model that is thought to be generating the actual data and it represents the true relationship between the variables. The population regression function is also known as the data generating process. The sample generating process is the relation that has been estimated using the sample observations. Con’t…………………………
• The population regression function is a hypothetical conjecture
about the form of the relationships between the response variable and the set of explanatory variables. • The parameters in the model are the regression coefficients which represent the weights given to each predictor in the linear combination of them. • The sample regression equation contains estimated numerical values of those coefficients, the set of which are chosen to best fit data from a particular sample (best fit = minimizes the sum of squared residuals). • The concise version: the population model species the form of the model. • The sample regression equation comes from fitting that form to observed data. Con’t…………………………….
• Before you begin with regression analysis, you need to
identify the population regression function (PRF). • The PRF defines reality (or your perception of it) as it relates to your topic of interest. • To identify it, you need to determine your dependent and independent variables (and how they’ll be measured) as well as the mathematical function describing how the variables are related. Simple Linear Regression Con’t…………………………… Concept of Regression Analysis
Regression analysis is the process of estimating the relationship
between two or more variables. In any regression there are dependent variables and explanatory (independent) variables; hence, regression used to study the dependence of one variable (dependent variable) over one or more of explanatory (independent) variables. how the average value of the dependent variable (regressand) varies with values of explanatory variables (regressors). In Regression Analysis the primary objective is to estimate parameters of population based on empirical data; and predict the average value of the dependent variable on the basis of values of explanatory variables. Con’t…………………………… Con’t…………………………… The nature of the error term
• An error term represents the margin of error within a statistical
model; it refers to the sum of the deviations within the regression line, which provides an explanation for the difference between the theoretical value of the model and the actual observed results. • An error term is a residual variable produced by a statistical or mathematical model, which is created when the model does not fully represent the actual relationship between the independent variables and the dependent variables. • As a result of this incomplete relationship, the error term is the amount at which the equation may differ during empirical analysis. • The error term is also known as the residual, disturbance, or remainder term, and is variously represented in models by the letters e, ε, or u. Assumptions of the Classical Simple Linear Regression Model 1. The model is linear in parameters. • The classical assumed that the model should be linear in the parameters regardless of whether the explanatory and the dependent variables are linear or not. • This is because if the parameters are non-linear it is difficult to estimate them since their value is not known but you are given with the data of the dependent and independent variable. Con’t…………………………………… Con’t………………………………….
2. Ui- is a random real variable
• This means that the value which u may assume in any one period depends on chance; it may be positive, negative or zero. Every value has a certain probability of being assumed by u in any particular instance. 3. The mean value of the random variable(U) in any particular period is zero • This means that for each value of x, the random variable(u) may assume various values, some greater than zero and some smaller than zero, but if we considered all the possible and negative values of u, for any given value of X, they would have on average value equal to zero. • In other words the positive and negative values of u cancel each other. Mathematically, E(Ui)= 0………………………………..….(2.3) Con’t……………………………..
4. The variance of the random variable(U) is constant in each
period (The assumption of homoscedasticity) • For all values of X, the u’s will show the same dispersion around their mean. • In Fig.2.c this assumption is denoted by the fact that the values that u can assume lie with in the same limits, irrespective of the value of X. • For , u can assume any value with in the range AB; for , u can assume any value with in the range CD which is equal to AB and so on. Con’t…………………………… Con’t…………………………………… Con’t………………………….
7. The are a set of fixed values in the hypothetical process of
repeated sampling which underlies the linear regression model. • This means that, in taking large number of samples on Y and X, the values are the same in all samples, but the values do differ from sample to sample, and so of course do the values of . 8. The random variable (U) is independent of the explanatory variables. • This means there is no correlation between the random variable and the explanatory variable. • If two variables are unrelated their covariance is zero. Con’t…………………………. Con’t…………………………………..
9. The explanatory variables are measured without error
• U absorbs the influence of omitted variables and possibly errors of measurement in the y’s. i.e., we will assume that the regressors are error free, while y values may or may not include errors of measurement. The Multiple Linear Regression Model
• In the multiple linear regression model, a dependent variable Y
can depend on a whole series of explanatory variables or regressors. • For instance, in demand studies we study the relationship between quantity demanded of a good and price of the good, price of substitute goods and the consumer’s income. • The model we assume is: Con’t…………………………. Con’t…………………………………. Assumptions of Multiple Regression Model Con’t………………………………. Parameter Estimation: Least Squares Methods of estimation
• Specifying the model and stating its underlying assumptions are
the first stage of any econometric application. • The next step is the estimation of the numerical values of the parameters of economic relationships. • The parameters of the simple linear regression model can be estimated by various methods. Three of the most commonly used methods are: – Ordinary least square method (OLS)
– Maximum likelihood method (MLM)
– Method of moments (MM)
Ordinary Least Squares (OLS) Assumptions of Classical Simple Regression Model (CLRM) Deriving OLS Estimators Con’t………………………………. Con’t…………………………… Con’t……………………………… Con’t…………………………… Con’t……………………………. Con’t…………………………… Con’t…………………………….. Con’t…………………………….. Con’t…………………………… Con’t…………………………… Covariance
What is Covariance? In mathematics and statistics, covariance
is a measure of the relationship between two random variables. The metric evaluates how much – to what extent – the variables change together. In other words, it is essentially a measure of the variance between two variables. The variance can take any positive or negative values. The values are interpreted as follows: • Positive covariance: Indicates that two variables tend to move in the same direction. • Negative covariance: Reveals that two variables tend to move in inverse directions. Con’t………………………… Con’t……………………………….
• Covariance measures the total variation of two random
variables from their expected values. • Using covariance, we can only gauge the direction of the relationship (whether the variables tend to move in tandem or show an inverse relationship). • However, it does not indicate the strength of the relationship, nor the dependency between the variables. Correlation coefficient
• The degree of association is measured by a correlation
coefficient, denoted by r. • It is sometimes called Pearson’s correlation coefficient after its originator and is a measure of linear association. • If a curved line is needed to express the relationship, other and more complicated measures of the correlation must be used. • Correlation measures the strength of the relationship between variables. • Correlation is the scaled measure of covariance. • It is dimensionless. In other words, the correlation coefficient is always a pure value and not measured in any units. Con’t…………………………. • The correlation coefficient is measured on a scale that varies from + 1 through 0 to – 1. • Complete correlation between two variables is expressed by either + 1 or -1. • When one variable increases as the other increases the correlation is positive; when one decreases as the other increases it is negative. • Complete absence of correlation is represented by 0. • The relationship between the two concepts can be expressed using the formula below: Con’t…………………………. Con’t………………………………. • Where: • rxy – the correlation coefficient of the linear relationship between the variables x and y • xi – the values of the x-variable in a sample • x̅ – the mean of the values of the x-variable • yi – the values of the y-variable in a sample • ȳ – the mean of the values of the y-variable In order to calculate the correlation coefficient using the formula above, you must undertake the following steps: 1. Obtain a data sample with the values of x-variable and y- variable. 2. Calculate the means (averages) x̅ for the x-variable and ȳ for the y-variable. Con’t……………………………… 3. For the x-variable, subtract the mean from each value of the x- variable (let’s call this new variable “a”). Do the same for the y-variable (let’s call this variable “b”). 4. Multiply each a-value by the corresponding b-value and find the sum of these multiplications (the final value is the numerator in the formula). 5. Square each a-value and calculate the sum of the result 6. Find the square root of the value obtained in the previous step (this is the denominator in the formula). 7. Divide the value obtained in step 4 by the value obtained in step 7. Coefficient of determination(R2 )
What is the coefficient of determination?
• The coefficient of determination (R²) measures how well a statistical model predicts an outcome. • The outcome is represented by the model’s dependent variable . • The lowest possible value of R² is 0 and the highest possible value is 1. • Put simply, the better a model is at making predictions, the closer its R² will be to 1. • Example: Coefficient of determination, Imagine that you perform a simple linear regression that predicts students’ exam scores (dependent variable) from their time spent studying ( independent variable). Con’t………………………………. • If the R2 is 0, the linear regression model doesn’t allow you to predict exam scores any better than simply estimating that everyone has an average exam score. • If the R2 is between 0 and 1, the model allows you to partially predict exam scores. The model’s estimates are not perfect, but they’re better than simply using the average exam score. • If the R2 is 1, the model allows you to perfectly predict anyone’s exam score. • More technically, R2 is a measure of goodness of fit. • It is the proportion of variance in the dependent variable that is explained by the model. Con’t……………… Con’t……………………………………. Con’t………………………………. Con’t………………………… Con’t…………………………..
• You can also say that the R² is the proportion of variance
“explained” or “accounted for” by the model. The proportion that remains (1 − R²) is the variance that is not predicted by the model. • If you prefer, you can write the R² as a percentage instead of a proportion. Simply multiply the proportion by 100. • Example: • Interpreting R²A simple linear regression that predicts students’ exam scores (dependent variable) from their study time (independent variable) has an R² of .71. From this R² value, we know that: • 71% of the variance in students’ exam scores is predicted by their study time • 29% of the variance in student’s exam scores is unexplained by the model • The students’ study time has a large effect on their exam scores Con’t…………………………………
• Studying longer may or may not cause an improvement in the
students’ scores. • Although this causal relationship is very plausible, the R² alone can’t tell us why there’s a relationship between students’ study time and exam scores. • For example, students might find studying less frustrating when they understand the course material well, so they study longer. Hypotheses testing
• Computing p-values for t tests So far, we have talked about
how to test hypotheses using a classical approach: after stating the alternative hypothesis, we choose a significance level, which then determines a critical value. • Once the critical value has been identified, the value of the t statistic is compared with the critical value, and the null is either rejected or not rejected at the given significance level. • Even after deciding on the appropriate alternative, there is a component of arbi-rariness to the classical approach, which results from having to choose a significance level ahead of time. • Different researchers prefer different significance levels, depending on the particular application. • There is no “correct” significance level. Con’t………………………. • Committing to a significance level ahead of time can hide useful information about the outcome of a hypothesis test. • For example, suppose that we wish to test the null hypothesis that a parameter is zero against a two-sided alternative, and with 40 degrees of freedom we obtain a t statistic equal to 1.85. • The null hypothesis is not rejected at the 5% level, since the t statistic is less than the two-tailed critical value. • A researcher whose agenda is not to reject the null could simply report this outcome along with the estimate: the null hypothesis is not rejected at the 5% level. Con’t……………………………….
• Given the observed value of the t statistic, what is the smallest
significance level at which the null hypothesis would be rejected? This level is known as the p-value for the test. • Example, we know the p-value is greater than .05, since the null is not rejected at the 5% level, and we know that the p- value is less than .10, since the null is rejected at the 10% level. • We obtain the actual p-value by computing the probability that a t random variable, with 40 df, is larger than 1.85 in absolute value. • That is, the p-value is the significance level of the test when we use the value of the test statistic, 1.85 in the above example, as the critical value for the test. Con’t………………………………
• Significance tests of the individual regression coefficients.
If the multiple regression model conforms with the underlying economic theory, one would expect the exogenous variables xj to influence the endogenous variable y in particular directions. • In the econometric model, the estimated regression coefficients should therefore display the theoretically expected signs. • In addition it needs to be examined whether the influencing factors do in fact matter for the explanation of the endogenous variable. Con’t…………………………. Because, if a regression coefficient has the expected sign but only randomly deviates from 0, the explanatory variable will have no systematic influence on the endogenous variable. • Whether an independent variable exhibits a systematic influence on the dependent variable, can be checked with a significance test: • If the null hypothesis H0 : βj = 0 is being contrasted to the alternative hypothesis H1 : βj≠ 0 one speaks of a two-sided significance test.