Simple Linear Regression
Analysis
What is a regression model?
• In very general terms, regression is concerned with
describing and evaluating the relationship between a
given variable and one or more other variables on which
the given variable depends.
• More specifically, regression is an attempt to explain
movements in a variable by reference to movements in
one or more other variables.
Simple linear regression
• For simplicity, suppose for now that it is believed that Y
depends on only one X variable.
• Some examples of the kind of relationship that may be of
interest include:
– Assessing the relationship between the expected
return of an asset and the market risk premium
– To forecast sales based on the amount spent on
advertising
• In this case, it appears that:
– there is an approximate positive linear relationship
between X and Y, which means that increases in X
are usually accompanied by increases in Y, and
– the relationship between them can be described
approximately by a straight line.
• It would therefore be of interest to determine to what
extent this relationship can be described by an equation
that can be estimated using a defined procedure.
The error term
• The above functional relationship is deterministic or
exact, that is, given their floor area we can determine
the exact rental value properties.
• But this is clearly unrealistic: it would be extremely
unlikely for a model to perfectly predict a variable since it
is impossible to control every possible condition that may
interfere with the response variable.
• In our case, different properties with the same floor area
are not expected to have equal rental value due to age
of the property, construction material, distance to central
business district, etc.
Some of the reasons for the inclusion of the error term are:
• Omitted variables: even in the general case where
there is more than one explanatory variable, some
determinants of Y will always in practice be omitted from
the model.
• This might, for example, arise because the number of
influences on Y is too large to place in a single model,
or because some determinants of Y may be
unobservable or not measurable.
• Measurement error: there may be errors in the way that Y
is measured (inaccuracy in collection and measurement of
sample data).
• There are bound to be random outside influences on Y
that can not be modeled.
• For example, conflicts, terrorist attacks or a computer
failure could all affect financial asset returns in a way that
cannot be captured in a model.
• Similarly, many researchers argue that human behaviour
has an inherent randomness and unpredictability!
• Thus, a full specification of a regression model should
include a specification of the properties of the of the
disturbance (error) term including its probability
distribution.
• This information is given by what we call basic
assumptions or assumptions of the classical linear
regression model (CLRM).
Assumptions of the classical linear
regression model
Ordinary least-squares (OLS) method of
estimation
The magnitude of each individual error is the vertical
distance between the actual observed points and the
estimating line.
Precision and Standard Errors
A brief review of statistical inference
• Hypothesis testing is a procedure for checking the
validity of a statistical hypothesis.
• It is the process by which we decide whether the null
hypothesis should be rejected or not.
• This decision is based on a test statistic – a value
computed from the sample.
Test of significance of regression coefficients
Remarks
• A significance level of α = 0.05 means that a result as extreme as
this would be expected only 5% of the time as a consequence of
chance alone (that is, a 5% chance of incorrectly rejecting the
null hypothesis).
• If the test statistic in a hypothesis test falls in the non-rejection
region, we say that the null hypothesis is not rejected. It is
incorrect to say that the null hypothesis is ‘accepted’.
• If the null hypothesis is rejected, then we conclude that the result
of the test is ‘statistically significant’; otherwise, we say that the
result of the test is ‘not significant’ or that it is ‘insignificant’.
The capital asset pricing model (CAPM)
• Note: Since asset returns (instead of prices) are used
in the analysis, the interpretation is in terms of
percentages.
• In our example, the β coefficient estimate of 1.344 is
interpreted as:
• if the excess return of the market portfolio (over the risk
free rate) increases by 1%, then the excess returns of
this particular asset will be expected to increase by
1.344%.
Fitting a simple linear regression model
using EViews
• There are a number of computer software readily
available for regression analysis.
• Thus, one does not need to go through the details of
the calculations involved.
• In this course, we will use EViews software for all
analyses.
To conduct regression analysis, click on:
Quick Estimate Equation…
You will see a pop-up window with header: Equation
Estimation.
• Under Estimation settings, you have a number of
options.
• You have to choose the appropriate model (or method of
estimation) from the list.
• Here you will be working with the first one on the list: LS
– Least Squares (to mean OLS).
• Suppose you want to estimate a linear regression model
in which the dependent variable is y and the
independent variables are x1, x2 and x3.
• Under the Equation specification box, type the
dependent variable, followed by c (for the constant
term), and then each of the independent variables (all
separated by space):
y c x1 x2 x3
• When you click on OK, you will get the fitted model
alongside model fit statistics.
In our case, we type:
asset_return c market_return
When you click on OK, you will get the fitted linear
regression model (or CAPM) with several model-fit
statistics.
For now, let us consider the results displayed in the
upper panel of the output.
Illustration
• If p-value = 0.002, then we reject the null hypothesis at all
levels of significance greater than 0.002 = 0.2% but less than
0.05 = 5% (maximum tolerable probability of Type I Error).
• In particular, since 0.002 < 0.01, we reject the null hypothesis
at the 1% level of significance.
• If p-value = 0.037, then we reject the null hypothesis at all
levels of significance greater than 0.037 = 3.7% but less than
0.05 = 5% .
• In particular, since 0.037 < 0.05, we reject the null hypothesis
at the 5% level of significance.
• If p-value = 0.072, then we do not reject the null hypothesis
at the conventional 5% level of significance (since 0.072 =
7.2% > 5%).
• In general, we do not reject the null hypothesis if Prob. >
0.05!