0% found this document useful (0 votes)
21 views58 pages

Econometrics For MGT ppt-2

Chapter Two discusses regression analysis, focusing on the differences between sample and population regression functions, the nature of error terms, and the assumptions underlying linear regression models. It explains the estimation of parameters using methods like Ordinary Least Squares and introduces concepts such as covariance, correlation coefficients, and the coefficient of determination (R²). Additionally, it covers hypothesis testing related to regression coefficients and the significance of explanatory variables.

Uploaded by

mengistuyilma79
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views58 pages

Econometrics For MGT ppt-2

Chapter Two discusses regression analysis, focusing on the differences between sample and population regression functions, the nature of error terms, and the assumptions underlying linear regression models. It explains the estimation of parameters using methods like Ordinary Least Squares and introduces concepts such as covariance, correlation coefficients, and the coefficient of determination (R²). Additionally, it covers hypothesis testing related to regression coefficients and the significance of explanatory variables.

Uploaded by

mengistuyilma79
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 58

Chapter Two: Regression analysis

What is the difference between the sample regression


function and the population regression function?
Regression analysis is used to model the relationship between a
response variable and one or more predictor variables.
Estimation is a key factor used in regression analysis.
The population regression function is a description of the model
that is thought to be generating the actual data and it represents
the true relationship between the variables.
The population regression function is also known as the data
generating process.
The sample generating process is the relation that has been
estimated using the sample observations.
Con’t…………………………

• The population regression function is a hypothetical conjecture


about the form of the relationships between the response variable
and the set of explanatory variables.
• The parameters in the model are the regression coefficients which
represent the weights given to each predictor in the linear
combination of them.
• The sample regression equation contains estimated numerical
values of those coefficients, the set of which are chosen to best fit
data from a particular sample (best fit = minimizes the sum of
squared residuals).
• The concise version: the population model species the form of
the model.
• The sample regression equation comes from fitting that form to
observed data.
Con’t…………………………….

• Before you begin with regression analysis, you need to


identify the population regression function (PRF).
• The PRF defines reality (or your perception of it) as it relates
to your topic of interest.
• To identify it, you need to determine your dependent and
independent variables (and how they’ll be measured) as well
as the mathematical function describing how the variables are
related.
Simple Linear Regression
Con’t……………………………
Concept of Regression Analysis

Regression analysis is the process of estimating the relationship


between two or more variables.
In any regression there are dependent variables and explanatory
(independent) variables; hence, regression used to study the dependence
of one variable (dependent variable) over one or more of explanatory
(independent) variables.
how the average value of the dependent variable (regressand) varies
with values of explanatory variables (regressors).
In Regression Analysis the primary objective is to estimate parameters
of population based on empirical data; and predict the average value of
the dependent variable on the basis of values of explanatory variables.
Con’t……………………………
Con’t……………………………
The nature of the error term

• An error term represents the margin of error within a statistical


model; it refers to the sum of the deviations within the
regression line, which provides an explanation for the
difference between the theoretical value of the model and the
actual observed results.
• An error term is a residual variable produced by a statistical or
mathematical model, which is created when the model does
not fully represent the actual relationship between the
independent variables and the dependent variables.
• As a result of this incomplete relationship, the error term is the
amount at which the equation may differ during empirical
analysis.
• The error term is also known as the residual, disturbance, or
remainder term, and is variously represented in models by the
letters e, ε, or u.
Assumptions of the Classical Simple Linear Regression
Model
1. The model is linear in parameters.
• The classical assumed that the model should be linear in the
parameters regardless of whether the explanatory and the
dependent variables are linear or not.
• This is because if the parameters are non-linear it is difficult
to estimate them since their value is not known but you are
given with the data of the dependent and independent variable.
Con’t……………………………………
Con’t………………………………….

2. Ui- is a random real variable


• This means that the value which u may assume in any one period
depends on chance; it may be positive, negative or zero. Every
value has a certain probability of being assumed by u in any
particular instance.
3. The mean value of the random variable(U) in any particular
period is zero
• This means that for each value of x, the random variable(u) may
assume various values, some greater than zero and some smaller
than zero, but if we considered all the possible and negative
values of u, for any given value of X, they would have on
average value equal to zero.
• In other words the positive and negative values of u cancel each
other.
Mathematically, E(Ui)= 0………………………………..….(2.3)
Con’t……………………………..

4. The variance of the random variable(U) is constant in each


period (The assumption of homoscedasticity)
• For all values of X, the u’s will show the same dispersion around
their mean.
• In Fig.2.c this assumption is denoted by the fact that the values
that u can assume lie with in the same limits, irrespective of the
value of X.
• For , u can assume any value with in the range AB; for , u can
assume any value with in the range CD which is equal to AB and
so on.
Con’t……………………………
Con’t……………………………………
Con’t………………………….

7. The are a set of fixed values in the hypothetical process of


repeated sampling which underlies the linear regression
model.
• This means that, in taking large number of samples on Y and
X, the values are the same in all samples, but the values do
differ from sample to sample, and so of course do the values
of .
8. The random variable (U) is independent of the explanatory
variables.
• This means there is no correlation between the random
variable and the explanatory variable.
• If two variables are unrelated their covariance is zero.
Con’t………………………….
Con’t…………………………………..

9. The explanatory variables are measured without error


• U absorbs the influence of omitted variables and possibly
errors of measurement in the y’s. i.e., we will assume that the
regressors are error free, while y values may or may not
include errors of measurement.
The Multiple Linear Regression Model

• In the multiple linear regression model, a dependent variable Y


can depend on a whole series of explanatory variables or
regressors.
• For instance, in demand studies we study the relationship
between quantity demanded of a good and price of the good,
price of substitute goods and the consumer’s income.
• The model we assume is:
Con’t………………………….
Con’t………………………………….
Assumptions of Multiple Regression Model
Con’t……………………………….
Parameter Estimation: Least Squares Methods of estimation

• Specifying the model and stating its underlying assumptions are


the first stage of any econometric application.
• The next step is the estimation of the numerical values of the
parameters of economic relationships.
• The parameters of the simple linear regression model can be
estimated by various methods. Three of the most commonly used
methods are:
– Ordinary least square method (OLS)

– Maximum likelihood method (MLM)

– Method of moments (MM)


Ordinary Least Squares (OLS)
Assumptions of Classical Simple Regression Model (CLRM)
Deriving OLS Estimators
Con’t……………………………….
Con’t……………………………
Con’t………………………………
Con’t……………………………
Con’t…………………………….
Con’t……………………………
Con’t……………………………..
Con’t……………………………..
Con’t……………………………
Con’t……………………………
Covariance

 What is Covariance? In mathematics and statistics, covariance


is a measure of the relationship between two random variables.
 The metric evaluates how much – to what extent – the
variables change together.
 In other words, it is essentially a measure of the variance
between two variables.
 The variance can take any positive or negative values. The
values are interpreted as follows:
• Positive covariance: Indicates that two variables tend to move
in the same direction.
• Negative covariance: Reveals that two variables tend to move
in inverse directions.
Con’t…………………………
Con’t……………………………….

• Covariance measures the total variation of two random


variables from their expected values.
• Using covariance, we can only gauge the direction of the
relationship (whether the variables tend to move in tandem or
show an inverse relationship).
• However, it does not indicate the strength of the relationship,
nor the dependency between the variables.
Correlation coefficient

• The degree of association is measured by a correlation


coefficient, denoted by r.
• It is sometimes called Pearson’s correlation coefficient after its
originator and is a measure of linear association.
• If a curved line is needed to express the relationship, other and
more complicated measures of the correlation must be used.
• Correlation measures the strength of the relationship between
variables.
• Correlation is the scaled measure of covariance.
• It is dimensionless. In other words, the correlation coefficient
is always a pure value and not measured in any units.
Con’t………………………….
• The correlation coefficient is measured on a scale that varies
from + 1 through 0 to – 1.
• Complete correlation between two variables is expressed by
either + 1 or -1.
• When one variable increases as the other increases the
correlation is positive; when one decreases as the other
increases it is negative.
• Complete absence of correlation is represented by 0.
• The relationship between the two concepts can be expressed
using the formula below:
Con’t………………………….
Con’t……………………………….
• Where:
• rxy – the correlation coefficient of the linear relationship
between the variables x and y
• xi – the values of the x-variable in a sample
• x̅ – the mean of the values of the x-variable
• yi – the values of the y-variable in a sample
• ȳ – the mean of the values of the y-variable
 In order to calculate the correlation coefficient using the
formula above, you must undertake the following steps:
1. Obtain a data sample with the values of x-variable and y-
variable.
2. Calculate the means (averages) x̅ for the x-variable and ȳ for
the y-variable.
Con’t………………………………
3. For the x-variable, subtract the mean from each value of the x-
variable (let’s call this new variable “a”). Do the same for the
y-variable (let’s call this variable “b”).
4. Multiply each a-value by the corresponding b-value and find
the sum of these multiplications (the final value is the
numerator in the formula).
5. Square each a-value and calculate the sum of the result
6. Find the square root of the value obtained in the previous step
(this is the denominator in the formula).
7. Divide the value obtained in step 4 by the value obtained
in step 7.
Coefficient of determination(R2 )

 What is the coefficient of determination?


• The coefficient of determination (R²) measures how well a
statistical model predicts an outcome.
• The outcome is represented by the model’s dependent variable
.
• The lowest possible value of R² is 0 and the highest possible
value is 1.
• Put simply, the better a model is at making predictions, the
closer its R² will be to 1.
• Example: Coefficient of determination, Imagine that you
perform a simple linear regression that predicts students’ exam
scores (dependent variable) from their time spent studying (
independent variable).
Con’t……………………………….
• If the R2 is 0, the linear regression model doesn’t allow you to
predict exam scores any better than simply estimating that
everyone has an average exam score.
• If the R2 is between 0 and 1, the model allows you to partially
predict exam scores. The model’s estimates are not perfect, but
they’re better than simply using the average exam score.
• If the R2 is 1, the model allows you to perfectly predict
anyone’s exam score.
• More technically, R2 is a measure of goodness of fit.
• It is the proportion of variance in the dependent variable that is
explained by the model.
Con’t………………
Con’t…………………………………….
Con’t……………………………….
Con’t…………………………
Con’t…………………………..

• You can also say that the R² is the proportion of variance


“explained” or “accounted for” by the model. The proportion that
remains (1 − R²) is the variance that is not predicted by the model.
• If you prefer, you can write the R² as a percentage instead of a
proportion. Simply multiply the proportion by 100.
• Example:
• Interpreting R²A simple linear regression that predicts students’
exam scores (dependent variable) from their study time
(independent variable) has an R² of .71. From this R² value, we
know that:
• 71% of the variance in students’ exam scores is predicted by their
study time
• 29% of the variance in student’s exam scores is unexplained by the
model
• The students’ study time has a large effect on their exam scores
Con’t…………………………………

• Studying longer may or may not cause an improvement in the


students’ scores.
• Although this causal relationship is very plausible, the R²
alone can’t tell us why there’s a relationship between students’
study time and exam scores.
• For example, students might find studying less frustrating
when they understand the course material well, so they study
longer.
Hypotheses testing

• Computing p-values for t tests So far, we have talked about


how to test hypotheses using a classical approach: after stating
the alternative hypothesis, we choose a significance level,
which then determines a critical value.
• Once the critical value has been identified, the value of the t
statistic is compared with the critical value, and the null is
either rejected or not rejected at the given significance level.
• Even after deciding on the appropriate alternative, there is a
component of arbi-rariness to the classical approach, which
results from having to choose a significance level ahead of
time.
• Different researchers prefer different significance levels,
depending on the particular application.
• There is no “correct” significance level.
Con’t……………………….
• Committing to a significance level ahead of time can hide
useful information about the outcome of a hypothesis test.
• For example, suppose that we wish to test the null hypothesis
that a parameter is zero against a two-sided alternative, and
with 40 degrees of freedom we obtain a t statistic equal to
1.85.
• The null hypothesis is not rejected at the 5% level, since the t
statistic is less than the two-tailed critical value.
• A researcher whose agenda is not to reject the null could
simply report this outcome along with the estimate: the null
hypothesis is not rejected at the 5% level.
Con’t……………………………….

• Given the observed value of the t statistic, what is the smallest


significance level at which the null hypothesis would be
rejected? This level is known as the p-value for the test.
• Example, we know the p-value is greater than .05, since the
null is not rejected at the 5% level, and we know that the p-
value is less than .10, since the null is rejected at the 10%
level.
• We obtain the actual p-value by computing the probability that
a t random variable, with 40 df, is larger than 1.85 in absolute
value.
• That is, the p-value is the significance level of the test when
we use the value of the test statistic, 1.85 in the above
example, as the critical value for the test.
Con’t………………………………

• Significance tests of the individual regression coefficients.


If the multiple regression model conforms with the underlying
economic theory, one would expect the exogenous variables xj
to influence the endogenous variable y in particular directions.
• In the econometric model, the estimated regression
coefficients should therefore display the theoretically expected
signs.
• In addition it needs to be examined whether the influencing
factors do in fact matter for the explanation of the endogenous
variable.
Con’t………………………….
Because, if a regression coefficient has the expected sign but only
randomly deviates from 0, the explanatory variable will have
no systematic influence on the endogenous variable.
• Whether an independent variable exhibits a systematic
influence on the dependent variable, can be checked with a
significance test:
• If the null hypothesis H0 : βj = 0 is being contrasted to the
alternative hypothesis H1 : βj≠ 0 one speaks of a two-sided
significance test.

You might also like