0% found this document useful (0 votes)

75 views

Regression With A Binary Dependent Variable

This document discusses regression models for binary dependent variables, including the linear probability model, probit model, and logit model. It explains that the linear probability model can predict probabilities outside the valid 0-1 range, while the probit and logit models use an S-curve functional form to ensure predicted probabilities remain between 0 and 1. The probit and logit models are estimated using maximum likelihood rather than nonlinear least squares. Standard errors and inferences for the coefficients can be obtained as usual for large samples. An example analyzes mortgage denial data using these different binary response models.

Uploaded by

David Edem

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

75 views

Regression With A Binary Dependent Variable

Uploaded by

David Edem

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 63

Regression with a Binary

Dependent Variable
 Linear Probability Model
 Probit and Logit Regression
 Probit Model
 Logit Regression
 Estimation and Inference
 Nonlinear Least Squares
 Maximum Likelihood
 Marginal Effect
 Application
 Misspecification
 So far the dependent variable (Y) has been
continuous:
 Corn yield in Ghana
 Exams score
What if Y is binary?
 Y = get into college; X= father’s years of
education
 Y = person smokes, or not; X = income
 Y = mortgage application is accepted, or not;

X=income, house characeteristics, marital status,

race.
Example: Mortgage denial and race
Variables
 Dependent variable:
 Is the mortgage denied or accepted?
 Independent variables:
 income, wealth, employment status
 other loan, property characteristics
 race of applicant
The Linear Probability Model
 A natural starting point is the linear regression
model with a single regressor:

But,
 What does mean when Y is binary?
Is ?
 What does the line mean when Y is
binary?
 What does the predicted value mean when Y is
binary? For example, what does = 0.26 mean?
Recall assumption #1: E(ui |Xi ) = 0, so

When Y is binary,

so
When Y is binary, the linear regression model,

is called the linear probability model.

 The predicted value is a probability:
 =
probability that Y=1 given x.
 = the predicted probability that Yi = 1, given X.
 = change in probability that Y = 1 for a
given x:
LPM Model
 The following model of bond ratings (b) was
estimated, with interest payments (r ) and
profit (p) as the explanatory variables:

bˆi  2.79  0.76 pi  0.12ri

(2.10) (0.06) (0.04)
R 2  0.15, DW  1.78
1  AArating
b {
0  BBrating
LPM Model
 The coefficients are interpreted as in the
usual OLS models, i.e. a 1% rise in profits,
gives a 0.76% increase in the probability
of a bond getting the AA rating
 The R-squared statistic is low, but this is
probably due to the LPM approach, so we
would usually ignore it.
 The t-statistics are interpreted in the
usual way.
Example: linear probability model,
HMDA data
Mortgage denial v. ratio of debt
payments to income (P/I ratio) in the
HMDA data set
Linear probability model: HMDA data

 What is the predicted value for P/I ratio = .3?

 Calculating “effects”: increase P/I ratio from .3

to .4:

 The effect on the probability of denial of an

increase in P/I ratio from .3 to .4 is to increase the
probability by .061, that is, by 6.1 percentage
points.
Next include black as a regressor:

 Predicted probability of denial:

 for black applicant with P/I ratio = .3:

 for white applicant with P/I ratio = .3:

 difference = .177 = 17.7 percentage points.

 Coefficient on black is significant at the 5% level.
The linear probability model:
Summary
 Models probability as a linear function of X.
 Advantages:
 simple to estimate and to interpret
 inference is the same as for multiple regression (need
heteroskedasticity-robust standard errors)
 Disadvantages:
 Does it make sense that the probability should be linear
in X?
 Predicted probabilities can be < 0 or > 1!
 These disadvantages can be solved by using a
nonlinear probability model: probit and logit
regression.
Probit and Logit Regression
The problem with the linear probability model
is that it models the probability of Y = 1 as
being linear:

Instead, we want:
 0 ≤ Pr(Y = 1|X) ≤ 1 for all X.
 Pr(Y = 1|X) to be increasing in X (for > 0).
This requires a nonlinear functional form for
the probability. How about an “S-curve”.
The probit model satisfies these conditions:
 0 ≤ Pr(Y = 1|X) ≤ 1 for all X.
 Pr(Y = 1|X) to be increasing in X (for > 0).
Probit regression models the probability that Y=1
using the cumulative standard normal distribution
function, evaluated at

 Φ is the cumulative normal distribution function.

 is the “z-value” or “z-index” of the
probit model.
Example: Suppose , so

 Pr(Y = 1|X = .4) = area under the standard normal

density to left of z = -.8, which is
Pr(Z ≤ － 0.8) = .2119
Why use the cumulative normal
probability
distribution?
 The “S-shape” gives us what we want:
 0 ≤ Pr(Y = 1|X) ≤ 1 for all X.
 Pr(Y = 1|X) to be increasing in X (for > 0).
 Easy to use - the probabilities are tabulated
in the cumulative normal tables.
 Relatively straightforward interpretation:
 z-value =
 is the predicted z-value, given X
 is the change in the z-value for a unit change
in X
 Another way to see the probit model is
through the interpretation of a latent
variable.
 Suppose there exists a latent variable ,

 where is unobserved.
 The observed Y is 1 if , and is 0 if
< 0.
Note that implies
homoscedasticity.
In other words,

Similarly,
Furthermore, since we only can estimate
and , not 0 , and separately. It
is assumed that = 1.
Therefore,
STATA Example: HMDA data, ctd.

 Positive coefficient: does this make sense?

 Standard errors have usual interpretation.
 Predicted probabilities:

 Effect of change in P/I ratio from .3 to .4:

Pr(deny = = .4) = .159
Predicted probability of denial rises from .097 to .
159.
Probit regression with multiple
regressors

 Φ is the cumulative normal distribution

function.
 is the “z-value”
or “z-index” of the probit model.
 is the effect on the z-score of a unit
change in X1, holding constant X2.
STATA Example: HMDA data, ctd.

 Is the coefficient on black statistically

significant?
 Estimated effect of race for P/I ratio = .3:

 Difference in rejection probabilities = .158

(15.8 percentage points)
Logit regression
Logit regression models the probability of
Y = 1 as the cumulative standard logistic
distribution function, evaluated at

F is the cumulative logistic distribution

function:
where
Example:

Why bother with logit if we have probit?

 Historically, logit is more convenient to
compute.
 In practice, very similar to probit.
Predicted probabilities from estimated
probit and logit models usually are very
close.
Estimation and Inference
 Probit model:

 Estimation and inference

 How to estimate and ?
 What is the sampling distribution of the estimators?
 Why can we use the usual methods of inference?
 First discuss nonlinear least squares (easier to
explain).
 Then discuss maximum likelihood estimation
(what is actually done in practice).
Probit estimation by nonlinear least
squares
Recall OLS:

 The result is the OLS estimators and .

In probit, we have a different regression function -
the nonlinear probit model. So, we could estimate
and by nonlinear least squares:

 Solving this yields the nonlinear least squares

estimator of the probit coefficients.
How to solve this minimization problem?
 Calculus doesn’t give and explicit solution.
 Must be solved numerically using the computer,
e.g. by “trial and error” method of trying one set
of values for O , then trying another, and
another,...
 Better idea: use specialized minimization
algorithms.
In practice, nonlinear least squares isn’t used
because it isn’t efficient - an estimator with a
smaller variance is...
Probit estimation by maximum
likelihood
The likelihood function is the conditional density of Y1,
… , Yn given X1, … , Xn, treated as a function of the
unknown parameters and .
 The maximum likelihood estimator (MLE) is the
value of ( , ) that maximize the likelihood
function.
 The MLE is the value of ( , ) that best describe
the full distribution of the data.
 In large samples, the MLE is:
 consistent.
 normally distributed.
 efficient (has the smallest variance of all estimators).
Special case: the probit MLE with no
X
Y = 1 with probability p, =0 with probability
(1-p) (Bernoulli distribution)
Data: Y1, … , Yn, i.i.d.
Derivation of the likelihood starts with the
density of Y1:

so
Joint density of (Y1, Y2):
Because Y1 and Y2 are independent,

Joint density of (Y1, … , Yn):

The likelihood is the joint density, treated as
a function of the unknown parameters,
which is p,

 The MLE maximizes the likelihood. It is

standard to work with the log likelihood,
ln f (p; Y1, … , Yn):
Solving for p yields the MLE. That is,
satisfies,
The MLE in the “no-X” case (Bernoulli distribution):

 For Yi i.i.d. Bernoulli, the MLE is the “natural”

estimator of p, the fraction of 1’s, which is YN .
 We already know the essentials of inference:
 In large n, the sampling distribution of = is
normally distributed.
 Thus inference is “as usual”: hypothesis testing via t-
statistic, confidence interval as ±1.96SE.
 STATA note: to emphasize requirement of large-n,
the printout calls the t-statistic the z-statistic.
The probit likelihood with one X
The derivation starts with the density of Y1,
given X1:
 The probit likelihood function is the joint
density of Y1, … , Yn given X1, … , Xn,
treated as a function of ,
The probit likelihood function:

 Can’t solve for the maximum explicitly.

 Must maximize using numerical methods.
 As in the case of no X, in large samples:
 are consistent.
 are normally distributed.
 Their standard errors can be computed.
 Testing, confidence intervals proceeds as usual.
 For multiple X’s, see SW App. 11.2.
The logit likelihood with one X
 The only difference between probit and logit is
the functional form used for the probability: Φ
is replaced by the cumulative logistic function.
 Otherwise, the likelihood is similar; for details
see SW App. 11.2.
 As with probit,
 are consistent.
 are normally distributed.
 Their standard errors can be computed.
 Testing, confidence intervals proceeds as usual.
Measures of fit
The and don’t make sense here (why?). So,
two other specialized measures are used:
 The fraction correctly predicted = fraction
of Y ’s for which predicted probability is >50%
(if Yi = 1) or is <50% (if Yi = 0).
 The pseudo-R2 measure the fit using the
likelihood function: measures the improvement
in the value of the log likelihood, relative to
having no X’s (see SW App. 9.2). This simplifies
to the R2 in the linear model with normally
distributed errors.
Marginal Effect
However, what we really care is not
itself. We want to know how the change of
X will affect the probability that Y = 1. For
the probit model,

where is pdf of the standard normal

distribution.
The effect of the change in X on Pr(Y = 1|X)
depends on the value of X. In practice, we
usually evalute the marginal effect at the
sample average . i.e. The marginal
effect is
When X is binary, it is not clear what does
the sample average mean.
The marginal effect then measures the
probability difference between X = 1 and
X = 0.

In STATA, the command dprobit reports the

marginal effect, instead of
Application to the Boston HMDA Data
 Mortgages (home loans) are an essential
part of buying a home.
 Is there differential access to home loans
by race?
 If two otherwise identical individuals, one
white and one black, applied for a home
loan, is there a difference in the
probability of denial?
The HMDA Data Set
 Data on individual characteristics, property
characteristics, and loan denial/acceptance.
 The mortgage application process in 1990-
1991:
 Go to a bank or mortgage company.
 Fill out an application (personal+financial info).
 Meet with the loan officer.
 Then the loan officer decides - by law, in a
race-blind way. Presumably, the bank wants to
make profitable loans, and the loan officer
doesn’t want to originate defaults.
The loan officer’s decision
 Loan officer uses key financial variables:
 P/I ratio
 housing expense-to-income ratio
 loan-to-value ratio
 personal credit history
 The decision rule is nonlinear:
 loan-to-value ratio > 80%
 loan-to-value ratio > 95% (what happens in
default?)
 credit score
Regression specifications

 linear probability model

 probit

Main problem with the regressions so far: potential

omitted variable bias. All these enter the loan
officer decision function, are or could be correlated
with race:
 wealth, type of employment
 credit history
 family status

Variables in the HMDA data set....

Summary of Empirical Results
 Coefficients on the financial variables make sense.
 Black is statistically significant in all
specifications.
 Race-financial variable interactions aren’t
significant.
 Including the covariates sharply reduces the effect
of race on denial probability.
 LPM, probit, logit: similar estimates of effect of
race on the probability of denial.
 Estimated effects are large in a “real world”
sense.
Remaining threats to internal,
external validity
 Internal validity.
 omitted variable bias
 what else is learned in the in-person interviews?
 functional form misspecification (no...)
 measurement error (originally, yes; now, no...)
 selection
 random sample of loan applications
 define population to be loan applicants

 simultaneous causality (no)

 External validity
 This is for Boston in 1990-91. What about today?
Misspecification
 Misspecification is a big prolem in the
maximum likelihood estimation. We only
consider the problem of
heteroscedasticity.
 By assuming = 1 in the probit model,
we only estimate and in the
likelihood function. If ui is heteroscedastic
such that Var(ui) = , then we need to
estimate , and .
 But the problem can be more than increasing
number of parameters to be estimated.
Suppose the heteroscedasticity is of the form
then

 The presence of heteroscedasticity causes

inconsistency because the assumption of a
constant is what allows us to identify 0
and .
 To take a very particular but informative case,
suppose that the heteroscedasticity takes the form
, then

 It is clear that our estimates will be inconsistent

for and , but consistent for and .
 The problem of misspecification such as
heteorscedasticity calls for the use of linear
probability model where although with
White’s heteroscedasticity-consistent covariance
matrix is not efficient, it is at least consistent.
Summary
 If Yi is binary, then E(Y|X) = Pr(Y=1|X).
 Three models:
 linear probability model (linear multiple regression)
 probit (cumulative standard normal distribution)
 logit (cumulative standard logistic distribution)
 LPM, probit, logit all produce predicted probabilities.
 Effect of ΔX is change in conditional probability that Y
= 1. For logit and probit, this depends on the initial X.
 Probit and logit are estimated via maximum likelihood.
 Coefficients are normally distributed for large n.
 Large-n hypothesis testing, confidence intervals are as usual.

Regresi Logistik
No ratings yet
Regresi Logistik
34 pages
Econometrics Chapter 11 PPT Slides
No ratings yet
Econometrics Chapter 11 PPT Slides
46 pages
Lecture 7 Probit
No ratings yet
Lecture 7 Probit
24 pages
Introduction To Econometrics - Stock & Watson - CH 9 Slides
100% (1)
Introduction To Econometrics - Stock & Watson - CH 9 Slides
69 pages
Regression With A Binary Dependent Variable
No ratings yet
Regression With A Binary Dependent Variable
55 pages
Notes 13
No ratings yet
Notes 13
18 pages
Chapter 5-LDVM-2024
No ratings yet
Chapter 5-LDVM-2024
27 pages
Section 9 Limited Dependent Variables
No ratings yet
Section 9 Limited Dependent Variables
17 pages
Limited Dependent Variables - Binary Dependent Variables
No ratings yet
Limited Dependent Variables - Binary Dependent Variables
24 pages
Probit and Logit-Madesh
No ratings yet
Probit and Logit-Madesh
22 pages
Msfe Week9
No ratings yet
Msfe Week9
5 pages
Logreg
No ratings yet
Logreg
26 pages
Regression With A Binary Dependent Variable: Michael Ash
No ratings yet
Regression With A Binary Dependent Variable: Michael Ash
18 pages
Lecture15 Binary Dependent Variables
No ratings yet
Lecture15 Binary Dependent Variables
38 pages
Econ Shu301 CH11
No ratings yet
Econ Shu301 CH11
53 pages
Chapter 5 Mgt
No ratings yet
Chapter 5 Mgt
60 pages
09-Limited Dependent Variable Models
No ratings yet
09-Limited Dependent Variable Models
71 pages
Limited Dependent Variables Models-1
No ratings yet
Limited Dependent Variables Models-1
23 pages
Presentation Last
No ratings yet
Presentation Last
20 pages
Logit and Probit Models
No ratings yet
Logit and Probit Models
44 pages
Assignment On Probit Model
No ratings yet
Assignment On Probit Model
17 pages
Probit_Logit_Models
No ratings yet
Probit_Logit_Models
26 pages
Seminar Econometrie
No ratings yet
Seminar Econometrie
15 pages
Chapter 6. Limited dependent variable models FINAL
No ratings yet
Chapter 6. Limited dependent variable models FINAL
16 pages
Chapter 15 Qualitative Response Regression Models Part 2
No ratings yet
Chapter 15 Qualitative Response Regression Models Part 2
31 pages
CH 5. Discrete Choice Model
No ratings yet
CH 5. Discrete Choice Model
38 pages
STAT3301 - Term Exam 2 - CH11 Study Package
No ratings yet
STAT3301 - Term Exam 2 - CH11 Study Package
6 pages
Lecture 8
No ratings yet
Lecture 8
39 pages
Lecture 6 LPM
No ratings yet
Lecture 6 LPM
14 pages
Econometrics Eviews 6
No ratings yet
Econometrics Eviews 6
12 pages
Newsletter 23 - Logit, Probit, Tobit (2P)
No ratings yet
Newsletter 23 - Logit, Probit, Tobit (2P)
2 pages
Probit Model
No ratings yet
Probit Model
29 pages
Logit and Probit: Models With Discrete Dependent Variables
No ratings yet
Logit and Probit: Models With Discrete Dependent Variables
30 pages
Binary Dependent Var
100% (1)
Binary Dependent Var
5 pages
Unitb - II - Linear Probability, Logit and Probit
No ratings yet
Unitb - II - Linear Probability, Logit and Probit
34 pages
L9 Logistical Regression Models Updated
No ratings yet
L9 Logistical Regression Models Updated
10 pages
MLE Lecture Note For Econometrician
No ratings yet
MLE Lecture Note For Econometrician
13 pages
metrikaq
No ratings yet
metrikaq
11 pages
Econometrics - Qualitative Response Models
No ratings yet
Econometrics - Qualitative Response Models
17 pages
Fernando, Logit Tobit Probit March 2011
No ratings yet
Fernando, Logit Tobit Probit March 2011
19 pages
Logit Probit and Tobit Models For Catego PDF
No ratings yet
Logit Probit and Tobit Models For Catego PDF
19 pages
1170_10045_121363
No ratings yet
1170_10045_121363
77 pages
Logistic Regression
No ratings yet
Logistic Regression
54 pages
Logit & Probit Theo Sheet
No ratings yet
Logit & Probit Theo Sheet
6 pages
LPM, Logit and Probit Models
No ratings yet
LPM, Logit and Probit Models
21 pages
411 Note LDV
No ratings yet
411 Note LDV
12 pages
An Introduction To Logistic Regression
No ratings yet
An Introduction To Logistic Regression
48 pages
PD2004 9
No ratings yet
PD2004 9
26 pages
26GeneralizedLinearModelBernoulliAnnotated PDF
No ratings yet
26GeneralizedLinearModelBernoulliAnnotated PDF
46 pages
Econometria Avanzada: Generalized Linear Models
No ratings yet
Econometria Avanzada: Generalized Linear Models
30 pages
2-13 Limited Dependent Variables - Probit and Logit
No ratings yet
2-13 Limited Dependent Variables - Probit and Logit
21 pages
Binary Logistic Regression - 6.2
No ratings yet
Binary Logistic Regression - 6.2
34 pages
Econometric Lec7
No ratings yet
Econometric Lec7
26 pages
ML - Unit 2
No ratings yet
ML - Unit 2
155 pages
Limited Dependent Variables
No ratings yet
Limited Dependent Variables
34 pages
Qualitative Response Regression Model - Probabilistic Models
No ratings yet
Qualitative Response Regression Model - Probabilistic Models
34 pages
A Simple But Effective Logistic Regression Derivation
No ratings yet
A Simple But Effective Logistic Regression Derivation
6 pages
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
From Everand
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
Jeffrey M. Wooldridge
No ratings yet
Top Numerical Methods With Matlab For Beginners!
From Everand
Top Numerical Methods With Matlab For Beginners!
Andrei Besedin
No ratings yet
Correlation and Regression: Six Sigma Thinking, #8
From Everand
Correlation and Regression: Six Sigma Thinking, #8
Sumeet Savant
5/5 (1)
Human Resource Management Group Assignment.
No ratings yet
Human Resource Management Group Assignment.
3 pages
Analysis of The Sweet Potato Value Chain in Ghana Linkages, Pathways, Governance and Constraints
No ratings yet
Analysis of The Sweet Potato Value Chain in Ghana Linkages, Pathways, Governance and Constraints
14 pages
Ansong Richard Selorm Latest Attachment Report
No ratings yet
Ansong Richard Selorm Latest Attachment Report
18 pages
Agricultural Marketing
No ratings yet
Agricultural Marketing
25 pages
The Paired T-Test
No ratings yet
The Paired T-Test
71 pages
Challenge Theme Paper 4: Sweetpotato Value Chains
No ratings yet
Challenge Theme Paper 4: Sweetpotato Value Chains
24 pages
Hypothesis Testing With One Sample
No ratings yet
Hypothesis Testing With One Sample
68 pages
Econometrics
No ratings yet
Econometrics
147 pages
Internship Report On Territory Level Sup PDF
No ratings yet
Internship Report On Territory Level Sup PDF
35 pages
Data Mining Bread Quality and Process Data in A Plant Bakery
No ratings yet
Data Mining Bread Quality and Process Data in A Plant Bakery
4 pages
EconometricsLectureNotes PDF
No ratings yet
EconometricsLectureNotes PDF
67 pages
The Importance of Record Keeping PDF
No ratings yet
The Importance of Record Keeping PDF
2 pages
Sales Manager Job Description
No ratings yet
Sales Manager Job Description
4 pages
Record Keeping Basics: Business Development Centre
No ratings yet
Record Keeping Basics: Business Development Centre
2 pages
Agribusiness Entrepreneurship Program Brochure
No ratings yet
Agribusiness Entrepreneurship Program Brochure
28 pages
Module 1 - Introduction To Records Management
No ratings yet
Module 1 - Introduction To Records Management
69 pages
Subseismic Fault Identification Using The Fault Likelihood Attribute-Application To Geosteering in The DJ Basin
100% (1)
Subseismic Fault Identification Using The Fault Likelihood Attribute-Application To Geosteering in The DJ Basin
18 pages
Black Swans
No ratings yet
Black Swans
60 pages
Sinharay & Johnson (2025)
No ratings yet
Sinharay & Johnson (2025)
20 pages
Download ebooks file Latent Variable Models and Factor Analysis A Unified Approach 3rd Edition David Bartholomew all chapters
100% (9)
Download ebooks file Latent Variable Models and Factor Analysis A Unified Approach 3rd Edition David Bartholomew all chapters
85 pages
How To Plan and Conduct A Microbiome Study An Ultimate Guide - Updated June 5, 2020
No ratings yet
How To Plan and Conduct A Microbiome Study An Ultimate Guide - Updated June 5, 2020
22 pages
The Social Environment Matters For Telomere Length and Internalizing Problems During AdolescenceJournal of Youth and Adolescence
No ratings yet
The Social Environment Matters For Telomere Length and Internalizing Problems During AdolescenceJournal of Youth and Adolescence
15 pages
Categorical Dependent Variable Regression Models Using STATA, SAS, and SPSS
No ratings yet
Categorical Dependent Variable Regression Models Using STATA, SAS, and SPSS
32 pages
Multivariate Statistical Analysis: Old School
No ratings yet
Multivariate Statistical Analysis: Old School
319 pages
Statistics and Probability
No ratings yet
Statistics and Probability
42 pages
Package CMPRSK': R Topics Documented
No ratings yet
Package CMPRSK': R Topics Documented
13 pages
Pari Applied
No ratings yet
Pari Applied
11 pages
MACF Dan MPACF (Tiao - Box1981)
No ratings yet
MACF Dan MPACF (Tiao - Box1981)
16 pages
La Note de David Marsan Sur PredPol
No ratings yet
La Note de David Marsan Sur PredPol
9 pages
Psychophysics Notes
No ratings yet
Psychophysics Notes
48 pages
Scott Long (2006) Testing For IIA in The Multinomial Model
No ratings yet
Scott Long (2006) Testing For IIA in The Multinomial Model
18 pages
1 s2.0 S1568494621001976 Main
No ratings yet
1 s2.0 S1568494621001976 Main
10 pages
Vol4 No1
No ratings yet
Vol4 No1
374 pages
Probability Theory and Statistical Inference Empirical Modeling With Observational Data Aris Spanos Download PDF
100% (4)
Probability Theory and Statistical Inference Empirical Modeling With Observational Data Aris Spanos Download PDF
62 pages
HW1 Solutions
No ratings yet
HW1 Solutions
3 pages
Storm CAT Bond Modeling and Valuation
No ratings yet
Storm CAT Bond Modeling and Valuation
27 pages
A1w2017s PDF
No ratings yet
A1w2017s PDF
11 pages
Xue Kunsong2000 PDF
No ratings yet
Xue Kunsong2000 PDF
16 pages
Applied Time-Series Analysis: Arun K. Tangirala
No ratings yet
Applied Time-Series Analysis: Arun K. Tangirala
50 pages
Received 16 November 2016 Accepted 9 May 2017
No ratings yet
Received 16 November 2016 Accepted 9 May 2017
14 pages
SIMULTECH2016_Barlas
No ratings yet
SIMULTECH2016_Barlas
31 pages
Pondicherry University Pondicherry: Semester Pattern) Effective From 2009-2010 (Onwards)
No ratings yet
Pondicherry University Pondicherry: Semester Pattern) Effective From 2009-2010 (Onwards)
44 pages
The Allocation of Decisions in Organizations
No ratings yet
The Allocation of Decisions in Organizations
46 pages
Yu Et Al-2017-Risk Analysis
No ratings yet
Yu Et Al-2017-Risk Analysis
15 pages
Memoria
No ratings yet
Memoria
47 pages
Logistic Regression and SGD
No ratings yet
Logistic Regression and SGD
10 pages

Regression With A Binary Dependent Variable

Uploaded by

Regression With A Binary Dependent Variable

Uploaded by

Regression with a Binary

X=income, house characeteristics, marital status,

is called the linear probability model.

bˆi  2.79  0.76 pi  0.12ri

 What is the predicted value for P/I ratio = .3?

 Calculating “effects”: increase P/I ratio from .3

 The effect on the probability of denial of an

 Predicted probability of denial:

 for white applicant with P/I ratio = .3:

 difference = .177 = 17.7 percentage points.

 Φ is the cumulative normal distribution function.

 Pr(Y = 1|X = .4) = area under the standard normal

 Positive coefficient: does this make sense?

 Effect of change in P/I ratio from .3 to .4:

 Φ is the cumulative normal distribution

 Is the coefficient on black statistically

 Difference in rejection probabilities = .158

F is the cumulative logistic distribution

Why bother with logit if we have probit?

 Estimation and inference

 The result is the OLS estimators and .

 Solving this yields the nonlinear least squares

Joint density of (Y1, … , Yn):

 The MLE maximizes the likelihood. It is

 For Yi i.i.d. Bernoulli, the MLE is the “natural”

 Can’t solve for the maximum explicitly.

where is pdf of the standard normal

In STATA, the command dprobit reports the

 linear probability model

Main problem with the regressions so far: potential

Variables in the HMDA data set....

 simultaneous causality (no)

 The presence of heteroscedasticity causes

 It is clear that our estimates will be inconsistent

You might also like