Regression With A Binary Dependent Variable
Regression With A Binary Dependent Variable
Dependent Variable
Linear Probability Model
Probit and Logit Regression
Probit Model
Logit Regression
Estimation and Inference
Nonlinear Least Squares
Maximum Likelihood
Marginal Effect
Application
Misspecification
So far the dependent variable (Y) has been
continuous:
Corn yield in Ghana
Exams score
What if Y is binary?
Y = get into college; X= father’s years of
education
Y = person smokes, or not; X = income
Y = mortgage application is accepted, or not;
But,
What does mean when Y is binary?
Is ?
What does the line mean when Y is
binary?
What does the predicted value mean when Y is
binary? For example, what does = 0.26 mean?
Recall assumption #1: E(ui |Xi ) = 0, so
When Y is binary,
so
When Y is binary, the linear regression model,
Instead, we want:
0 ≤ Pr(Y = 1|X) ≤ 1 for all X.
Pr(Y = 1|X) to be increasing in X (for > 0).
This requires a nonlinear functional form for
the probability. How about an “S-curve”.
The probit model satisfies these conditions:
0 ≤ Pr(Y = 1|X) ≤ 1 for all X.
Pr(Y = 1|X) to be increasing in X (for > 0).
Probit regression models the probability that Y=1
using the cumulative standard normal distribution
function, evaluated at
where is unobserved.
The observed Y is 1 if , and is 0 if
< 0.
Note that implies
homoscedasticity.
In other words,
Similarly,
Furthermore, since we only can estimate
and , not 0 , and separately. It
is assumed that = 1.
Therefore,
STATA Example: HMDA data, ctd.
so
Joint density of (Y1, Y2):
Because Y1 and Y2 are independent,