0% found this document useful (0 votes)

44 views76 pages

Chapter 1

Uploaded by

chalturegasa17

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

44 views76 pages

Chapter 1

Uploaded by

chalturegasa17

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

Chapter 1:

Regression Analysis with Qualitative Data:

Binary (or Dummy Variables)
4.1. Describing Qualitative Information
 In Econometrics I, both dependent and independent variables
in our multiple regression models are quantitative in their
nature (e.g., hourly wage rate, years of education, GDP, prices,
and costs).
 However, some variables are essentially qualitative or nominal
scale, in nature, such as sex, race, color, religion, industry of a
firm (manufacturing, retail, etc.), and the regions in Ethiopia.
 For example, holding all other factors constant, female workers
are found to earn less than their male counterparts.
 Since such variables usually indicate presence or absence of a
“quality” or an attribute, such as male or female, black or
white, collage graduate or not collage graduate they are
essentially nominal scale variables.
Cont’d
 One way we could “quantify” such attributes is by constructing
artificial variables that take on values of 1 or 0, 1 indicating the
presence (or possession) of that attribute and 0 indicating the absence of
that attribute.
 For example 1 may indicate that a person is a female and 0 may designate
a male; or 1 may indicate that a person is a college graduate, and 0 that
the person is not, and so on.
 Variables that assume such 0 and 1 values are called dummy variables.
 Such variables are thus essentially a device to classify data into mutually
exclusive categories such as male or female.
 Dummy variables can be incorporated in regression models just as easily
as quantitative variables.
 As a matter of fact, a regression model may contain regressors that are all
exclusively dummy, or qualitative, in nature.
Cont’d
 Note that although they are easy to incorporate in the regression
models, one must use dummy variables carefully.
 Particularly consider the followings:-
1. When we have dummy variable for each category or group
and also intercept in our model, we have a case of perfect
collinearity, that is, exact linear relationships among the
independent variables.
 The sum of all the dummy variables is one.
 In this case if a qualitative variable has m categories, introduce
only (m − 1) dummy variables.
 Other wise we fall into what is known as the dummy variable
trap. That is, the situation of perfect collinearity or perfect
multicollinearity arises.
Cont’d
 This rule also applies if we have more than one qualitative
variable in the model.
 For each qualitative regressor the number of dummy variables
introduced must be one less than the categories of that variable.
2. The category for which no dummy variable is assigned is known
as the base, benchmark, control, comparison, reference, or
omitted category and all comparisons are made in relation to the
benchmark category.
 This is the one that is omitted and against which the other
dummy variables are assessed.
3. The intercept value (β1) represents the mean value of the
benchmark category
Cont’d
4. The coefficients attached to dummy variables are known as the
differential intercept coefficients because they tell by how
much the value of the intercept that receives the value of 1 differs
from the intercept coefficient of the benchmark category.
5. If a qualitative variable has more than one category, the choice of
the benchmark category is strictly up to the researcher
4.2. Dummy as Independent Variables
4.2.1. A Single Dummy Independent Variable
 Consider the simple model of hourly wage determination:
𝑤𝑎𝑔𝑒=𝛽 1+𝛿 𝐷+ 𝛽2 𝑒𝑑𝑢+𝜀𝑖
 In our model only two observed factors affect wage rate :- gender
and education.
 Since when the person is female, and when the person is male, the
parameter has the following interpretation: is the difference in
hourly wage between females and males, given the same amount
of education.
 Thus, the coefficient determines whether there is
discrimination against women: if , then, for the same level of
other factors, women earn less than men on average.
Cont’d
Note that :-
1. In our model the base group is male (D=0) and hence the
interpretation of the coefficient of dummy is made against the
base group.
 If the coefficient is less than zero, the females are paid less
compared to their male counterparts for the same level of
education.
 But if its coefficient is positive, females are paid more compared
to males.
2. In any application, it does not matter how we choose the base
group
Cont’d
 Some researchers prefer to drop the overall intercept in the model
and to include dummy variables for each group.
 The equation would then be , where the intercept for men is and
the intercept for women is .
 There is no dummy variable trap in this case because we do not
have an overall intercept.
 However, this formulation has little advantage, since testing for a
difference in the intercepts is more difficult, and there is no
generally agreed upon way to compute R-squared in regressions
without an intercept.
 Therefore, we will always include an overall intercept for the
base group.
Cont’d
 Question, is the difference between the females and males earnings
statistically significant or due to chance factor?
 We need to test that!!
 In general, suppose simple linear regression model takes the form:

 When D=1 the model becomes:

 When D=0 the model becomes:
 Thus, given the zero mean assumption (i.e., ), the mean of Y
is: ,when D=1 and when D=0
 Note that both means have the same slope () but they differ in
their intercepts.
Cont’d
 Given the assumption of classical linear regression model, a model
with one or more dummy variables can be estimated using the
OLS estimation method.
 Once the model is estimated, we have to test whether the
coefficients of dummy are statistically significant or not.
 Suppose we have the model with one dummy variable:

 Now, test significance of . That is,

 We can test this using the usual t-test:

Cont’d
Decision Rule:
 Reject the null hypothesis if .
 This means that existence of the attribute is statistically
significant.
 Example:

 The negative intercept- mean for men, in this case-is meaningless.

 The coefficient on D is interesting, because it measures the average
difference in hourly wage between a woman and a man, given the
same levels of educ, exper, and tenure.
Cont’d
 If we take woman and man with same levels of education, experience,
and tenure, woman earns, on average, $1.81 less per hour than the man.
 It is important to remember that, because we have performed multiple
regression and controlled for educ, exper, and tenure, the $1.81 wage
differential cannot be explained by different average levels of education,
experience, or tenure between men and women.
 We can conclude that differential of $1.81 is due to gender or factors
associated with gender that we have not controlled for in regression.
 Is this wage differential statistically significant?
 The usual t-test is given by: .
 Using the rule of thumb since |t|>2, we reject null hypothesis and hence
the wage differential is statistically significant.
Cont’d
 Now, suppose all non-dummy explanatory variables are dropped
from our model.
 Then the result becomes:

 Where D= 1 implies female and D = 0 means male

 Interpretations of OLS estimates:
 The intercept is the average wage for men in the sample (when D
= 0). Thus, on average, males earn $7.10 per hour.
 The coefficient on D is the difference in the average wage between
females and males. Thus, the average wage for females in the
sample is 7.10 - 2.51 = 4.59, or $4.59 per hour.
Cont’d
 Comparing the mean wage of males and females, the mean wage rate
of males is higher by $2.51 per hour.
 Generally, simple regression on a constant and a dummy variable is a
straightforward way to compare the means of two groups.
 Since t = -8.37, the difference is statistically significant.
 For the usual t test to be valid, we must assume that the
homoskedasticity assumption holds, which means that the population
variance in wages for men is the same as that for women.
 The estimated wage differential between men and women is larger in
simple regression model than in multiple regression model because
simple regression model does not control for differences in
education, experience, and the like.
 Multiple regression model gives more reliable estimate of ceteris
paribus gender wage gap; it still indicates a very large differential.
[Link] Dummy Variables Regression Models
 Suppose we have several dummy explanatory variables.
 For simplicity let Y be monthly salaries of pubic school teachers
in Addis Ababa, Amhara and Oromia (three regions in Ethiopia).
 Let D1 = 0 if region is Addis Ababa, D2 = 1 if region is Oromia and
D3 = 1 if region is Amhara. Now, Addis Ababa is base group.
 Then multiple linear regression model (assuming all independent
variables are dummy variables) is given by:

 Where,


Cont’d
 Assuming that error term satisfies usual OLS assumptions, on
taking expectation of (1) on both sides, we obtain:
 Mean salary of public school teachers in Oromia region:
E(Y |D2 = 1, D3 = 0) = β1 + β2
Mean salary of public school teachers in the Amhara region is:
E(Y |D2 = 0, D3 = 1) = β1 + β3
mean salary of teachers in the Addis Ababa region is given by:
E(Y |D2 = 0, D3 = 0) = β1
 In other words, mean salary of public school teachers in Addis
Ababa region is given by intercept β1.
 In multiple regression “slope” coefficients β2 and β3 tell by how
much mean salaries of teachers in Oromia region and in Amhara
region differ from mean salary of teachers in Addis Ababa region.
Cont’d
 But are these differences statistically significant?
 Let results based on our multiple regression model are as follow:-

76)
(1435.953) (1499.615)

 Where * indicates p values.

 As these regression results show, mean salary of teachers in Addis
Ababa is about Birr 26,158, that of teachers in Oromia region is
lower by about Birr 1,734 and that of teachers in Amhara region is
lower by about Birr 3,265.
 The actual mean salaries in two regions can be easily obtained by
adding these differential salaries to mean salary of teachers in
Addis Ababa region.
Cont’d
 Thus, mean salary in Oromia region is Birr 24,424 (=26,158 –
1,734) and mean salary in Amhara region is Birr 22,893 (26,158 –
3,265).
 However, how do we know that these mean salaries are statistically
different from mean salary of teachers in Addis Ababa region,
comparison category?
 All we have to do is to find out if each of “slope” coefficients is
statistically significant.
 As can be seen from this regression, estimated slope coefficient for
Oromia region is not statistically significant, as its p value is
about 23%, whereas that of Amhara region is statistically
significant, as p value is only about 3.5%.
Cont’d
 Therefore, the overall conclusion is that statistically mean
salaries of public school teachers in Addis Ababa region and
Oromia region are about the same but mean salary of teachers in
Amhara region is statistically significantly lower by about Birr
3,265.
 Note that dummy variables will simply point out the
differences, if they exist.
 However, they do not suggest reasons for differences.
4.2.3. Interactions among Dummy Variables
 Consider the following wage model with two dummy independent
variables (D2 = 1 if sex is female and 0 otherwise, D3 = 1 if race is
nonwhite 0 otherwise):

where: Y = hourly wage in dollars

edu = education (years of schooling)
D2 = 1 if female, 0 otherwise
D3 = 1 if nonwhite, 0 otherwise
 In this model gender and race are qualitative regressors and
education is quantitative regressor.
 Implicitly this model assumes that differential effect of gender dummy
(D2) is constant across two categories of race and differential effect
of race dummy (D3) is also constant across two genders.
Cont’d
 That is to say, if mean salary is higher for males than for
females, this is so whether they are nonwhite or not.
 Likewise, if, say, nonwhite have lower mean wages, this is so
whether they are females or males.
 In many applications such assumption may be unsound.
 A female nonwhite may earn lower wages than male nonwhite.
 In other words, there may be interaction between two qualitative
variables D2 and D3.
 Therefore their effect on mean Y may not be simply additive as in
case of above equation rather they have multiplicative effect given
in the following model:-
Cont’d
 Assuming that error term has zero mean (i.e., zero mean
assumption)
 Then,
 This is mean hourly wage function for female nonwhite workers.
 Note that:
 α2 = differential effect of being female
 α3 = differential effect of being nonwhite
 α4 = differential effect of being female nonwhite
 The mean hourly wages of female nonwhite is different (by α4)
from mean hourly wages of males white.
cont’d
 If, for instance, all three differential dummy coefficients are
negative, this would imply that female nonwhite workers earn
much lower mean hourly wages than male white workers as
compared with base category, which in present example is male
white.
 Numerical example on Average Hourly Earnings in Relation to
Education, Gender and Race:-

 Now test statistical significance of differential intercept

coefficients
 t-values indicate that differential intercept coefficients are
statistically significantly different from zero.

Cont’d
 Our estimation result shows, ceteris paribus, average hourly
earnings of females are lower by about Birr 2.36 compared to their
male counterparts and average hourly earnings of nonwhite
workers are also lower by about Birr 1.73 compared to their white
counterparts.
 Now consider case of interaction of dummy variables:

 The two additive dummies are still statistically significant, but the
interactive dummy is not at conventional 5% level; the actual p
value of interaction dummy is about 8% level and hence it is
statistically significant at 10% level of significance.
Cont’d
 Interpretation: holding level of education constant, if you add
three dummy coefficients you will obtain: −1.964 (= −2.3605 −
1.7327 + 2.1289), which means that mean hourly wages of
nonwhite female workers is lower by about Birr 1.96, which is
between value of −2.3605 (gender difference alone) and −1.7327
(race difference alone) than white males.
4.3 Dummy as Dependent Variable
 So far, we considered dummy variables as right hand side or
independent variables.
 In all our models up until now, dependent variable y has had
quantitative meaning (for example, y is Birr amount).
 What happens if we want to use multiple regression model to
explain qualitative dependent event like:-
a) participating in labor force or not
b) willing to pay for improved environmental quality or not
c) using contraceptives or not
d) voting for given election or not, etc.
 In this case, our dependent variable takes on only two values:
zero and one (i.e., it is dummy variable).
 In other words, regressand is binary, or dichotomous, variable.
Cont’d
 For instance, if our dependent variable is decision to participate
in labor force, the response variable is 1= participate in labor
force and 0=not participate in labor force.
 Such binary variable can be analyzed in general probability
models, which can be binomial or multinomial models.
 We begin our study of qualitative response models with case of
binary choice model (where dependent variable is binary which
assumes a value 1 or 0).
 There are three approaches to develop model for binary
(qualitative) response regression:
1. Linear Probability Model (LPM)
2. Logit Model
3. Probit model
Linear Probability Model (LPM)
In such models we have important equation:

P( y 1 / x)  0  1 x1   2 x2       k xk
which says that the probability of success, that is, P(y=1/x), is
linear function of explanatory variables.
P(y=1/x) is also called response probability.
This model is example of binary response model and since
probabilities must sum to one, probability of failure which is
P(y=0/x)=1-P(y=1/x) is also linear function of explanatory
variables.
cont’d
The above model with binary dependent
variable is called linear probability
model (LPM).
It is linear because the response
probability is linear in parameters .
In LPM the coefficient, , measures change
in probability of P(y
success  k xk changes,
1 /x) when
holding other factors fixed:
Given this, the mechanics of OLS can be
used to estimate the model the same as
before because the model is linear.
cont’d

 Given random sample with k parameters and N number of

observations and consider the following linear regression model:
y = β1 + β2x2 + ... + βkxk + µ
As it is probability model, in order to interpret results in terms of
probability we take expectations on both sides of equation
If we estimate predicted equation we get:
yˆ ˆ0  ˆ1 x1  ˆ2 x2      ˆk xk
The slope coefficient for x1 measures predicted change in
probability of success when x1 increases by one unit.
However, in order to correctly interpret linear probability model,
we must know what constitutes a “success.”
Thus, it is good idea to give dependent variable a name that
describes event y = 1.
cont’d
 since E(e) = 0 , E(y|x) = b0 + b1x1 + … + bkxk
 But from probability theory
E(y|x) = 0.P(y = 0|x) + 1.P(y = 1|x)
=0+ P(y = 1|x),
 so we can write our model as
P(y = 1|x) = E(y|x) = b0 + b1x1 + … + bkxk
 So, the interpretation of bj is change in probability of success
when xj changes by a unit
 The predicted y is predicted probability of success
cont’d
 Now, if Pi = probability that Yi = 1 (that is, the event occurs), and
(1 − Pi) = probability that Yi = 0 (that is, the event does not occur),
the variable Yi has the following (probability) distribution.
cont’d
That is, Yi follows the Bernoulli probability distribution. Now the mathematical

expectation of Y is given by: E (Y ) 0(1  p) 1* p  p , which is equal to the

probability of success or the conditional expectation of Y given X (i.e.,

E ( y / x) 1   2 x2   3 x3  ...   k xk ) and the variance is given by:

var(Y )  p(1  p)
In general, the expectation of a Bernoulli random variable is the probability that the
random variable equals 1.
LPM, Numerical Example
 Suppose inlf (“in the labor force”) is a binary variable indicating
labor force participation by a married woman during a given year:
inlf =1 if the woman reports working for a wage outside the
home at some point during the year, and zero otherwise.
 We assume that labor force participation depends on other sources
of income, including husband’s earnings (nwifeinc), years of
education (educ), past years of labor market experience (exper),
age, number of children less than six years old (kidslt6), and
number of kids between 6 and 18 years of age (kidsge6).
 The estimated linear probability model using 753 sample size of
which 428 are women is given as:
Cont’d

 Using the usual t statistics, all variables of this estimated model

except kidsge6 are statistically significant, and all of the significant
variables have the effects we would expect based on economic
theory.
 In order to interpret the estimates, we must remember that a change
in the independent variable changes the probability that inlf =1.
Cont’d
 For example, the coefficient on edu means that keeping other
factors constant, another year of education increases probability of
labor force participation by .038.
 If we take this equation literally, 10 more years of education
increases probability of being in labor force by .038(10) = 0.38,
which is a large increase in a probability.
 The coefficient on nwifeinc implies that, if nwifeinc = 10
(probability that woman is in labor force falls by .034.
 Experience has been entered as quadratic to allow effect of past
experience to have diminishing effect on labor force
participation probability.
 Holding other factors fixed, the estimated change in the probability
is approximated as: 0.039 - 2(0.0006)exper = 0.039 - 0.0012 exper,
using exponent rule of derivative!
Problems with LPM
1. Non-normality of the error term
 Although OLS point estimation does not require the normality
assumption of the disturbances term, statistical inferences
(interval estimation and hypothesis testing) require that the
disturbance term must be normally distributed.
 However, the assumption of normality for the error term is not
acceptable for LPMs due to the fact that, like Yi, the disturbances
term also take only two values; that is, they also follow the
Bernoulli distribution.
Given our LPM (in matrix form), the error term is given by:  i  y  x . Thus,

 1  x when the event occurs and  i  x when the event doesn’t occur. Then the
probability distribution of  i is:
Cont’d
 This implies that the disturbance terms cannot be assumed to be
normally distributed; they follow the Bernoulli distribution.
 The violation of the normality assumption has serious effects on
statistical inferences.
2. The error term is heteroskedastic
 In the LPM, the disturbance terms are not homoscedastic.
 As statistical theory shows, for Bernoulli distribution theoretical
mean and variance are, respectively, p and p(1 − p), where p is
the probability of success (i.e., something happening), showing that
the variance is a function of the mean and hence the error variance
is heteroscedastic.
The variance of the error term is given by: var( )  pi (1  pi )  x (1  x )
.That is, the variance of the error term in the LPM is heteroscedastic.
Cont’d
Since Pi = E(Yi | Xi) = β1 + β2Xi , the variance of  i ultimately depends on the values of
X and hence is not homoscedastic.
 In the presence of heteroscedasticity, the OLS estimators, although
unbiased, are not efficient; that is, they do not have minimum
variance.
 Since error term is heteroskedastic, we use Generalized Least
Square for estimation
Since the variance of  i depends on E(Yi | Xi ), one way to resolve the heteroscedasticity
problem is to transform the model y 1   2 x2 by dividing it through by

.
The error term of this model is homoscedastic.
Cont’d
 In order to estimate wi , we can use the following two-step
procedure:
Step 1. Run the OLS regression y 1   2 x2 despite the heteroscedasticity problem and
obtain ˆYi = estimate of the true E(Yi | Xi). Then obtain ˆwi = ˆYi(1 − ˆYi), the
estimate of wi.
Step 2. Use the estimated wi to transform the data and estimate the transformed equation
by OLS (i.e., weighted least squares).
Cont’d
3. The possibility of obtaining probability values <0 or >1
( 0 E ( y / x 1 or 0  p 1)
Non-fulfillment of

 The problem is that LPM implicitly assumes that increases in x has

constant effect on probability of success
 That is, as x increases P(y = 1) continues to increase at constant
rate
 However, since 0≤P≤1, constant rate of increase is impossible
 To overcome this problem we consider nonlinear models: probit &
logit (which are also known as limited dependent variables)
Limited dependent variable (LDV) models
 These models refer to a dependent variable whose range of values
is substantively restricted.
 In LPM, problem is non-fulfillment of restriction:
 In order to overcome this problem, we need a model that will
produce predictions consistent with the underlying probability
theory for a given vector of regressors.
 The probability model has two basic features/requirements:
1. As Xi increases, Pi = E(Y = 1|X) increases but never steps outside
the interval
2. The relationship between Pi and Xi is nonlinear, that is, “Pi
approaches 0 at slower and slower rates as Xi gets small (or X
approaches ) and approaches 1 at slower and slower rates as Xi
gets very large (or X approaches ).’’
Cont’d
Symbolically,
lim p ( y 1/ x) 1 and
x  

lim p ( y 1/ x) 0
x  
Graphically,
Cont’d
 In both cases (symbol and graph), probability lies between 0 and 1
as Xi’s vary from () and the graph is sigmoid or S-shaped, which is
shape of cumulative distribution function (CDF) of any probability
density function (PDF).
 However, the problem is that since all CDF of PDF are S-shaped,
which CDF should we use?
 The commonly used CDFs are cumulative logistic distribution
and cumulative normal distribution.
 The cumulative logistic distribution is giving rise to Logit Model
and cumulative normal distribution is giving rise to Probit
Model
The Logit Model
 To explain basic ideas behind logit model, let us take simple
example of house ownership, defined as, y=1 if individual owns
house and zero otherwise.
 Suppose that probability to own house is function of income, we
can state LPM as;

 Now consider the following representation of house ownership

given by the form; G(z) = exp(z)/[1 + exp(z)] = L(z)
 Another common choice for G(z) is logistic function, which is
CDF for standard logistic random variable
 This case is referred to as logit model, or sometimes as logistic
regression
 Both functions (normal and logistic) have similar shapes – they
are increasing in z, most quickly around 0
The Logit Model

For ease of exposition we can re-write the

above function as;

This equation is known as logistic

distribution function.
The Logit Model
 Under this case , probability, Pi ranges between 0 and 1, as Zi
ranges from −∞ to +∞.
 One problem of LPM resolved, but now we have created another
problem, that is, Pi is non-linearly related to Zi (or explanatory
variables) and also to parameters(β’s).
 So the model is non-linear and thus we cannot use OLS
procedure to estimate parameters.
 However, the problem of non-linearity may be resolved through
log transformation as follows: If Pi is the probability of owning
a house, then (1− Pi), is the probability of not owning a house.
 Thus, we have;
Cont’d
 Therefore, we can write;

The ratio of Pi to 1 - Pi is termed as odds ratio in favor

of owning a house. It is simply the ratio of the probability
that a family will own a house to the probability that it
will not own a house.
Now if we take natural log of this equation, we obtain:
Cont’d
 Li is log of odds ratio and it is not only linear in X, but also
linear in parameters.
 Li is also called logit, hence name logit model for models like
this.
Estimation of logit:- Method of the Maximum Likelihood
 The logistic function is introduced or invented in the 19th
century (by Verhulst, 1804-1849) for the description of population
growth (Cramer, 2003).
 Consider the LPM:-

Suppose pi is the probability that yi =1 and (1  pi ) is the probability that

yi 0 .
Cont’d

 In order to construct the likelihood function, we note that the

yi 1 yi
contribution of the i th p (1  p )
observation can be written as: i i
 In the case of random sampling where all observations are sampled
independently (the binomial distribution), the likelihood function
will simply be the product of the individual contributions given as
follows:
Cont’d
 The technique of maximum likelihood requires that we choose those
values of the parameters of the LPM which maximize the likelihood
function given above.
 In practice, we maximize the logarithm of the likelihood function:
Cont’d
But we know that:

Now substituting into the last equation we get:

This model is non-linear and hence it requires iterative solution

 Thus, in MLE method our objective is to maximize logarithm of
likelihood function to obtain values of unknown parameters in
such a manner that probability of observing given Y’s is as
maximum as possible.
 For this purpose, we differentiate the logarithm of the likelihood
function partially with respect to each unknown, set the resulting
expressions to zero and solve the resulting expressions.
Cont’d
Important features of the logit model:
1. Although the probabilities lie between 0 and 1, the logits(L)
are not so bounded(do not lie between 0 and 1).
2. Although L is linear in X, the probabilities themselves are
not. This property is in contrast with the LPM, where the
probabilities increase linearly with X.
3. It is possible to add as many explanatory variables as may
be dictated by the underlying theory.
cont’d
4. Interpretation:
The interpretation of the logit model given in above is as
follows:
 β2, the slope coefficient, measures the change in L for a unit
change in X, that is, it tells how the log-odds ratio in favor of
owning a house change as income changes by a unit.
 The intercept β1 is the value of the log odds in favor of
owning a house if income is zero.
5. Given a certain level of income, say, X*, if we want to estimate
not the odds in favor of owning a house but the probability of
owning a house itself, this can be done directly from logistic
distribution function once the estimates of β1 and β2 are
available.
The Logit Model
 However, the important question is; How do
we estimate β1 and β2 in the first place?
The estimation procedure is given in the
next section.
[Link], whereas the LPM assumes that Pi is
linearly related to Xi, the logit model
assumes that the log of the odds ratio is
linearly related to Xi .
Probit Model
 Probit model is very much similar to logit model and in most
applications both models give quite similar results.
 The only difference lies on distribution they assume(apply).
 Logit model uses logistic cumulative distribution (function),
where as probit model assumes normal cumulative distribution
function(CDF).
Probit Model
 The cumulative standard normal curve resembles logistic curve
 Probit has z scores instead of logged odds along horizontal axis.
 The curve approaches, but does not reach 0 as z scores decrease
toward negative infinity,
 the curve approaches but does not reach 1 as z scores increase
toward positive infinity.
 Despite this difference, they give essentially equivalent results,
making the choice between them one of individual preferences
and computer program availability
The Probit Model
 Based on cumulative standard normal distribution, cumulative
probability associated with any z score equals:

 Where u is a random variable with mean of 0 and standard
deviation of 1.
 The formula merely says that probability of event equals area
under cumulative normal curve between negative infinity and Z.
 The larger the value of Z, the larger the cumulative probability.
 Because of complexity of formula, however, computers do the
calculations.
The Probit Model
 With probit as dependent variable, estimated coefficients show
change in z score units of inverse of cumulative standard normal
distribution rather than change in probabilities.
 Like logistic regression, probit analysis allows calculation of
changes in probabilities for specified values of independent
variables.
 Again, however, the effects of dummy and continuous variables
on predicted probabilities depend on choice of starting point.
 Changes in probabilities will emerge larger for points near
middle of curve than near floor or ceiling.
The Probit Model
 Recall linear probability model, written as P(y = 1|x) = xb
 An alternative is to model probability as function, G(xb), where
0<G(z)<1
 One choice for G(z) is standard normal cumulative distribution
function (cdf)
 G(z) = F(z) = P(Z≤z) ≡ ∫f(v)dv, where f(z) is standard normal, so
f(z) = (2p)-1/2exp(-z2/2)
 Thus, model expresses probability that y = 1 as P(Z≤xb ) = F(xb )
 This case is referred to as probit model
 Since it is nonlinear model, it cannot be estimated by our usual
methods, so we use maximum likelihood estimation
The Probit Model

Standard normal cumulative

distribution function

Standard normal probability

density function
Probits and Logits
 Both probit and logit are nonlinear and require maximum
likelihood estimation
 No real reason to prefer one over the other
 Traditionally saw more of logit, mainly because logistic
function leads to a more easily computed model
 Today, probit is easy to compute with standard packages, so
more popular
 If we see functional forms or CDF:
1. Logistic distribution with
Cont’d
2. Standard normal distribution with
Interpretation
 In general we care about the effect of x on P(y = 1|x), that is, we
care about ∂p/ ∂x
 For LPM, this is easily computed as coefficient on x
 For nonlinear probit and logit models, it’s more complicated:
 Using chain rule ∂p/ ∂xj = dG/dz*dz/dxj = G’(xb)bj
 For probit: f(xb) bj
 For logit: {exp(xb)/[1 + exp(xb)]}bj
 It is incorrect to just compare the coefficients across the three
models (Coefficients differ among models because of the
functional form of the CDF)
Interpretation (continued)
 Interpretation of marginal effects
 An increase in x increases (decreases) the probability that y=1 by
the marginal effect expressed as a percent
 For dummy independent variables, marginal effect is expressed
in comparison to the base category (x=0).
 For continuous independent variables, marginal effect is
expressed for a one-unit change in x.
 We can compare sign and significance (based on standard t
test) of coefficients, though to compare magnitude of effects,
we need to calculate the derivatives, say at the means
 Stata will do this for you in the probit case
Example: probit_insurance.dta
LPM Probit Logit Interpretation of coefficients
Retired 0.04* 0.11* 0.19*  Retired individuals (in comparison to
Age -0.002 -0.008 -0.01
non-retired individuals), individuals
Good health status 0.06* 0.19* 0.31*
with good health status, higher
HH income 0.0004* 0.001* 0.002*
Education years 0.02* 0.07* 0.11*
household income, higher education,
Married 0.12* 0.36* 0.57* married are more likely to have
Hispanic -0.12* -0.46* -0.81* health insurance, and Hispanic are
Constant 0.12 -1.06* -1.71* less likely to have health insurance
R2 0.08 0.07 0.07
 LPM, probit and logit coefficients
differ by a scale factor (and therefore
we cannot interpret the magnitude of
the coefficients).
Example: probit_insurance.dta
LPM Probit Logit Interpretation of marginal effects
Retired 0.04* 0.04* 0.04*
 Retired individuals are 4% more likely to have
insurance (in comparison with those that are not
Age -0.002 -0.003 -0.003
retired).
 For each additional year in education,
Good health status 0.06* 0.07* 0.07*
individuals are 2% more likely to have insurance
HH income 0.0004* 0.0005* 0.0004*
 Hispanics are 16% less likely to have insurance
Education years 0.02* 0.02* 0.02*
than non-Hispanics
Married -0.12* -0.12* -0.13*  Note that unlike the coefficients which are
Hispanic -0.12* -0.16* -0.16* different, the marginal effects are almost
identical in the three models.
Testing Hypotheses and Measures of Goodness-of-fit

Testing Statistical Significance of Each Slope Coefficient

 The procedure of testing significance of each coefficient in LDV
model is the same as in the case of the usual OLS.
 However, the z statistics in the stata output are approximation to
t statistics in the OLS.
 Note that this z has nothing to do with the Z-score/variable.
Testing Overall Statistical Significance of the Model:- Likelihood
Ratio (LR) Approach
 The LR test is based on the same concept as the F test in a linear
model.
 The LR test is based on the difference in the log-likelihood
functions for the unrestricted and restricted models.
Cont’d
 Because MLE maximizes log-likelihood function, dropping
variables generally leads to a smaller - or at least no larger log-
likelihood (This is similar to the fact that the R-squared never
increases when variables are dropped from a regression.)
 The question is whether the fall in the log-likelihood is large
enough to conclude that the dropped variables are important.
 We can make this decision once we have a test statistic and a set
of critical values.
 The likelihood ratio statistic is twice the difference in the log-
likelihoods:
Cont’d
 where Lur is log-likelihood value for the unrestricted model, and
Lr is the log likelihood value for the restricted model.
 Because Lur is greater than or equal to Lr, LR is nonnegative and
usually strictly positive.
 In computing LR statistic, it is important to know that Lur and Lr
can each be negative.
 This does not change the way that LR is computed; we must
preserve the negative signs.
 Contrary to linear regression model, there is no single measure
for the goodness-of-fit in binary response (choice) models.
 Often, goodness-of-fit measures are implicitly or explicitly based
on comparison with a model that contains only a constant as
explanatory variable.
Cont’d
 Let log denote maximum likelihood value of the model of
interest and let log denote maximum value of the log likelihood
function when all parameters, except the intercept, are set to
zero. Clearly, log L1  log .
 The larger the difference between the two log likelihoods values,
the more the extended model adds to the very restrictive model.
 Indeed, formal likelihood ratio(LR) test can be based on the
difference between the two values.
 A first goodness-of-fit measure is defined as;
Cont’d
 where N denotes the number of observations.
 McFadden(1974) suggested an alternative measure;

 and it is sometimes referred to as the Likelihood Ratio Index.

 Because the log likelihood is the sum of log probabilities, it
follows that log L0 log L1  0from
, which it is straightforward to show
that both measures take on values in the interval[0,1] only.
 If all estimated slope coefficients are equal to 0, we have
,such that R-squared is equal to zero.
Some example, Regression output
. logistic car income hhs

Logistic regression Number of obs = 40

LR chi2(2) = 30.14
Prob > chi2 = 0.0000
Log likelihood = -12.605647 Pseudo R2 = 0.5445

car Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]

income 2.192885 .7278499 2.37 0.018 1.144168 4.20283

hhs .7904801 .3257098 -0.57 0.568 .3525019 1.77264
_cons .0000231 .0000847 -2.91 0.004 1.75e-08 .0305203

Note: _cons estimates baseline odds.

Probit
. probit car income hhs

Iteration 0: log likelihood = -27.675866

Iteration 1: log likelihood = -12.781611
Iteration 2: log likelihood = -12.383587
Iteration 3: log likelihood = -12.375829
Iteration 4: log likelihood = -12.375827
Iteration 5: log likelihood = -12.375827

Probit regression Number of obs = 40

LR chi2(2) = 30.60
Prob > chi2 = 0.0000
Log likelihood = -12.375827 Pseudo R2 = 0.5528

car Coef. Std. Err. z P>|z| [95% Conf. Interval]

income .4607914 .1890465 2.44 0.015 .0902671 .8313158

hhs -.1360354 .2501562 -0.54 0.587 -.6263325 .3542617
_cons -6.252617 1.99502 -3.13 0.002 -10.16279 -2.342449
THANK
U!

Chapter 1 Qualitative Variables Final
No ratings yet
Chapter 1 Qualitative Variables Final
74 pages
Econometrics II Teaching Material
No ratings yet
Econometrics II Teaching Material
88 pages
Chapter 1 Dummy Variable Regression
No ratings yet
Chapter 1 Dummy Variable Regression
45 pages
Chapter 1 Dummy Variable Regression-1
No ratings yet
Chapter 1 Dummy Variable Regression-1
45 pages
2022 Econometrics I Chapter Four
No ratings yet
2022 Econometrics I Chapter Four
83 pages
Econometrics 2
No ratings yet
Econometrics 2
135 pages
CH 4 Eco
No ratings yet
CH 4 Eco
42 pages
Regression Analysis with Dummy Variables
100% (1)
Regression Analysis with Dummy Variables
71 pages
Econometrics II All Chapters
No ratings yet
Econometrics II All Chapters
240 pages
Econometrics II Chapter One
No ratings yet
Econometrics II Chapter One
87 pages
Dummy Variables in Regression
No ratings yet
Dummy Variables in Regression
37 pages
Chapter Four Econometrics
No ratings yet
Chapter Four Econometrics
61 pages
Econometrics II Notes
No ratings yet
Econometrics II Notes
72 pages
Econometrics II Chapter Two
No ratings yet
Econometrics II Chapter Two
96 pages
Chapter 4
No ratings yet
Chapter 4
78 pages
Chapter 1 Econometrics
No ratings yet
Chapter 1 Econometrics
21 pages
Discrete Choice & Limited Dependent Models
No ratings yet
Discrete Choice & Limited Dependent Models
66 pages
Chapter Three QM
No ratings yet
Chapter Three QM
77 pages
Econometrics II-1
No ratings yet
Econometrics II-1
56 pages
Econometrics II (N)
No ratings yet
Econometrics II (N)
30 pages
Week 6 - Dummy Variables
No ratings yet
Week 6 - Dummy Variables
15 pages
Econometrics I - Lecture 7 (Wooldridge)
No ratings yet
Econometrics I - Lecture 7 (Wooldridge)
34 pages
Econometrics I - Lecture 7 (Wooldridge) - Repair
No ratings yet
Econometrics I - Lecture 7 (Wooldridge) - Repair
34 pages
Regression Analysis with Dummy Variables
No ratings yet
Regression Analysis with Dummy Variables
25 pages
Chap 1
No ratings yet
Chap 1
77 pages
Chapter 1
No ratings yet
Chapter 1
47 pages
Dummies
No ratings yet
Dummies
5 pages
Dummy Variables
No ratings yet
Dummy Variables
17 pages
Econometrics ch-4
No ratings yet
Econometrics ch-4
78 pages
Lecture 5 Dummy Variable
No ratings yet
Lecture 5 Dummy Variable
11 pages
Econometrics Lecture Note Chapter 4 and 5
No ratings yet
Econometrics Lecture Note Chapter 4 and 5
39 pages
Econometrics For Finance Chapter 5
No ratings yet
Econometrics For Finance Chapter 5
12 pages
Chapter 7, Dummy Variable
No ratings yet
Chapter 7, Dummy Variable
13 pages
Econometrics 2
No ratings yet
Econometrics 2
84 pages
Introduction To Econometrics Ii (Econ-3062) : Mohammed Adem (PHD)
83% (6)
Introduction To Econometrics Ii (Econ-3062) : Mohammed Adem (PHD)
83 pages
Econometrics Il
No ratings yet
Econometrics Il
14 pages
CHapter 5 Acct
No ratings yet
CHapter 5 Acct
8 pages
Part 2 - Regression With Indicator Variables
No ratings yet
Part 2 - Regression With Indicator Variables
26 pages
2025-06-26 Lecture 7 Notes-C11
No ratings yet
2025-06-26 Lecture 7 Notes-C11
7 pages
Econometrics for Business Students
No ratings yet
Econometrics for Business Students
12 pages
Understanding Dummy Variables in Econometrics
No ratings yet
Understanding Dummy Variables in Econometrics
9 pages
Binary
No ratings yet
Binary
47 pages
Binary
No ratings yet
Binary
40 pages
Ch07 - Dummy Variables - Ver1
No ratings yet
Ch07 - Dummy Variables - Ver1
29 pages
Econometrics CH 1-4
100% (1)
Econometrics CH 1-4
315 pages
Metrics Course Outline
No ratings yet
Metrics Course Outline
22 pages
Dummy Variables in Regression
No ratings yet
Dummy Variables in Regression
8 pages
Chapter 2
No ratings yet
Chapter 2
97 pages
Econometrics Cha 4
No ratings yet
Econometrics Cha 4
72 pages
Chapter 7
No ratings yet
Chapter 7
31 pages
Anova
No ratings yet
Anova
14 pages
Ecoometrics Chapter 5 Regression With Qualitative Information
No ratings yet
Ecoometrics Chapter 5 Regression With Qualitative Information
68 pages
Understanding Dummy Variables in Econometrics
No ratings yet
Understanding Dummy Variables in Econometrics
21 pages
Dummy Variables in Regression Analysis
No ratings yet
Dummy Variables in Regression Analysis
17 pages
Dummy Variables in Regression Analysis
No ratings yet
Dummy Variables in Regression Analysis
136 pages
Lecture 08 Dummy Variables
No ratings yet
Lecture 08 Dummy Variables
6 pages
Chapter 1 Mhathics
No ratings yet
Chapter 1 Mhathics
41 pages
Fiqe
No ratings yet
Fiqe
4 pages
Chapter 2 Algebra
No ratings yet
Chapter 2 Algebra
31 pages
Ma Chapter 2
No ratings yet
Ma Chapter 2
26 pages
Linear Alg Chapter 1
No ratings yet
Linear Alg Chapter 1
14 pages
Statice Chapter 5 PDF
No ratings yet
Statice Chapter 5 PDF
7 pages
For Coc V
No ratings yet
For Coc V
4 pages
Categorical Data Analysis: 3rd Edition PDF
No ratings yet
Categorical Data Analysis: 3rd Edition PDF
43 pages
SPSS Data Analysis Examples - Multinomial Logistic Regression
No ratings yet
SPSS Data Analysis Examples - Multinomial Logistic Regression
5 pages
Thesis Ntua
100% (3)
Thesis Ntua
8 pages
Lec 02 Computation Graphs
No ratings yet
Lec 02 Computation Graphs
64 pages
Customer Churn Prediction Using Machine Learning Subcription Renewal On OTT Platforms
No ratings yet
Customer Churn Prediction Using Machine Learning Subcription Renewal On OTT Platforms
5 pages
Logistic Regression Cheat Sheet
No ratings yet
Logistic Regression Cheat Sheet
1 page
Credit Card Fraud Detection Using Predictive Modeling: A Review
No ratings yet
Credit Card Fraud Detection Using Predictive Modeling: A Review
7 pages
ProQuestDocuments 2013 03 08
No ratings yet
ProQuestDocuments 2013 03 08
147 pages
Final Industrial Report
No ratings yet
Final Industrial Report
34 pages
Roberts Ryan Machine Learning The Ultimate Beginners Guide F
No ratings yet
Roberts Ryan Machine Learning The Ultimate Beginners Guide F
45 pages
CS373 Lecture18.1
No ratings yet
CS373 Lecture18.1
33 pages
Journal Pone 0275950
No ratings yet
Journal Pone 0275950
19 pages
Neural Networks & Fuzzy Logic Intro
No ratings yet
Neural Networks & Fuzzy Logic Intro
42 pages
Paper 12
No ratings yet
Paper 12
19 pages
Gender Differences in Optimism: Applied Economics
No ratings yet
Gender Differences in Optimism: Applied Economics
15 pages
Smartphone Sensor Data for Activity Recognition
No ratings yet
Smartphone Sensor Data for Activity Recognition
18 pages
Logistic Regression A Brief Primer
No ratings yet
Logistic Regression A Brief Primer
6 pages
Social Indicators Research: Are People Who Participate in Cultural Activities More Satisfied With Life?
No ratings yet
Social Indicators Research: Are People Who Participate in Cultural Activities More Satisfied With Life?
17 pages
Sullivan Et Al 2018 - Should Multiple Imputation Be The Method of Choice For Handling Missing Data in Randomized Trials
No ratings yet
Sullivan Et Al 2018 - Should Multiple Imputation Be The Method of Choice For Handling Missing Data in Randomized Trials
17 pages
The Knowledge and Attitude of Fertile and Infertile Nigerians Regarding Adoption
No ratings yet
The Knowledge and Attitude of Fertile and Infertile Nigerians Regarding Adoption
3 pages
Final Year Project
No ratings yet
Final Year Project
62 pages
Muslim Consumer Preferences
No ratings yet
Muslim Consumer Preferences
16 pages
MBAA21040 RenuMeena1.0
No ratings yet
MBAA21040 RenuMeena1.0
2 pages
A Choice Model of Airline Passengers' Spending Behaviour in
No ratings yet
A Choice Model of Airline Passengers' Spending Behaviour in
12 pages
Factors Influencing Attainment of CEO Position For Women - USA
No ratings yet
Factors Influencing Attainment of CEO Position For Women - USA
18 pages
Controlled Epidemiological Studies 1st Edition Accessible DOCX Download
100% (17)
Controlled Epidemiological Studies 1st Edition Accessible DOCX Download
16 pages
Identifying Flood Prediction Using Machine Learning Techniques
No ratings yet
Identifying Flood Prediction Using Machine Learning Techniques
4 pages
ML Sentiment Analysis for Reviews
No ratings yet
ML Sentiment Analysis for Reviews
38 pages
Food Insecurity in U.S. Households That Include Children With Disabilities
No ratings yet
Food Insecurity in U.S. Households That Include Children With Disabilities
17 pages

Chapter 1

Uploaded by

Chapter 1

Uploaded by

Chapter 1:

Regression Analysis with Qualitative Data:

 When D=1 the model becomes:

 Now, test significance of . That is,

 We can test this using the usual t-test:

 The negative intercept- mean for men, in this case-is meaningless.

 Where D= 1 implies female and D = 0 means male

 Where * indicates p values.

where: Y = hourly wage in dollars

 Now test statistical significance of differential intercept

 Given random sample with k parameters and N number of

expectation of Y is given by: E (Y ) 0(1  p) 1* p  p , which is equal to the

E ( y / x) 1   2 x2   3 x3  ...   k xk ) and the variance is given by:

 Using the usual t statistics, all variables of this estimated model

 The problem is that LPM implicitly assumes that increases in x has

 Now consider the following representation of house ownership

For ease of exposition we can re-write the

This equation is known as logistic

The ratio of Pi to 1 - Pi is termed as odds ratio in favor

Suppose pi is the probability that yi =1 and (1  pi ) is the probability that

 In order to construct the likelihood function, we note that the

Now substituting into the last equation we get:

This model is non-linear and hence it requires iterative solution

Standard normal cumulative

Standard normal probability

Testing Statistical Significance of Each Slope Coefficient

 and it is sometimes referred to as the Likelihood Ratio Index.

Logistic regression Number of obs = 40

car Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]

income 2.192885 .7278499 2.37 0.018 1.144168 4.20283

Note: _cons estimates baseline odds.

Iteration 0: log likelihood = -27.675866

Probit regression Number of obs = 40

car Coef. Std. Err. z P>|z| [95% Conf. Interval]

income .4607914 .1890465 2.44 0.015 .0902671 .8313158

You might also like