Models for Binary Response Variables
Components of a Generalized Linear Model
Suppose the N observations on Y are independent and denote their values by y1 , y2 , ..., y N .
We assume that each Yi has probability density function or mass function of the form
f yi , i a i b yi exp yi Q i ; i 1, 2, ..., N 1 .
The term Q is called the natural parameter of the distribution. A more general representation
of 1 is given by
yii b i
f yi ;i , exp c y ,
i
a
The function a often has form a w for known weight wi and is called the dispersion
i
parameter.
Let xi1 , ..., xit denote values of t explanatory variables for the ith observation. The systematic
component, the linear model (GLM), relates parameters ni to the explanatory variables using a
linear predictor
i x
j
j ij ; i 1, 2, ..., N
. In matrix form
X where 1 , ..., N , 1 , ..., t , are model parameters and X is the N t model
matrix.
The link function, the third component of a GLM, connects the expectation i of Yi to the linear
predictor by i g i where g is a monotone, differentiable function. Thus, a GLM links the
expected value of the response to the explanatory variables through the equation
g i x
j
i ij
.
The function g for which g i i is called the canonical link, So that the relationship between
the natural parameter and the linear predictor is
i x
j
j ij
.
Logit Models
Many categorical response variables have only two categories. The observation for each subject
might be classified as a “Success” or a “Failure”. Represent these outcomes by 1 and 0. The
Bernoulli distribution for binary random variables specifies probabilities
p Y 1 and p Y 0 1 for the two outcomes, for which E Y . When Yi has Bernoulli
distribution with parameter i , the probability mass function is
f yi ; i iyi 1 i
1 yi
y
i
1 i i
1 i
1 i exp yi log i for yi 0 and 1
1 i
Models for Binary Response Variables ~ 1 of 8
This distribution is in the natural exponential family. The natural parameter Q log ,
1
the log odds of response 1, is called the logit of . GLMs that use the logit link are called logit
models.
Log-linear Models
Let ni denote the count in the ith cell and let mi E ni denote its expected value, i 1, 2, ..., N .
The Poisson probability mass function of ni is
e mi mini
f ni ; mi
ni !
1 ni log mi
f ni ; mi e mi e
ni !
for non-negative integer values of ni . This has natural exponential form
f yi ;i a i b yi exp yi Q i
where, yi ni and i mi , a i e mi ,
1
b ni and Q mi log mi
ni !
For the Poisson distribution, a GLM links a monotone function of mi to explanatory variables
through a linear model.
Since, the natural parameter in log mi , the canonical link function is the log link, i log mi .
The model using this link is
log mi x
j
j ij ; i 1, 2, ..., N 1 .
Model 1 is called a loglinear model for a contingency table.
Linear Probability Model
For a Binary response, the regression model
E Y x x
is called a linear probability model. The linear probability model has a major structural defect.
Probabilities must fall between 0 and 1, where as linear functions take values over the entire real
line.
Logistic Regression Model
Because of the structural problems with the linear probability model, it is more fruitful to study
models implying a curvilinear relationship between x and x . When we expect a monotonic
relationship, the S-shaped curves are natural shapes for regression curves a function having this
shape
exp x
x 1
1 exp x
is called the logistic regression function. When the model holds with 0 , the binary response
is independent of X .
The logistic regression curve 1 has
x
x 1 x .
x
Models for Binary Response Variables ~ 2 of 8
For model 1 , the odds of making response 1 are
x
x
exp x e e
1 x
This formula provides a basic interpretation for . The odds increase multiplicatively by e for
every unit increase in x . The log odds has the linear relationship
x
log
1 x x
Inference for Logistic Regression
From Wald’s (1943) general asymptotic results for ML estimators, it follows that parameter
estimators in logistic models have large sample normal distribution.
Let 1 , 2 , ..., q denote a subset of model parameters. Suppose we want to test H 0 : 0 .
Let M1 denote the fitted model and M 2 denote the simpler model with 0 . Large sample tests
can use Wilk’s (1938) likelihood ratio approach, with test statistic based on twice the log of the
ratio of maximized likelihoods for M1 and M 2 . Let L1 denote the maximized log likelihood for
M1 and L2 denote the maximized log likelihood for M 2 under H 0 , the statistic
2 L2 L1
has a large sample chi-squared distribution with degrees of freedom q .
Alternatively, by the large sample normality of parameter estimators, the statistic
1
ˆ ˆ ˆ
ˆ cov
has the same limiting null distribution (Wald 1943). This is called a Wald statistic.
Odds Ratio and Coefficient of the Linear Logistic Regression
When the independent variables are dichotomous or polychotomous, the logistic regression
coefficients can be linked with odds ratios. In the linear logistic model, the dependence of the
probability of success on independent variables is assumed to be
p
exp i xij
j 0
i 1
p
1 exp i xij
j 0
1
1 i
p
and 1 exp
x i ij
j 0
consider the simplest case, where there is one independent variable, x1 which is either 0 or1. The
linear regression model in 1 and ii becomes
e 0 1x1 1
p y 1| x1 0 1x1
and p y 0 | x1 0 1 x1
1 e 1 e
Models for Binary Response Variables ~ 3 of 8
Values of the model, when x1 0 and 1 , are
e 0 e 0 1
p y 1| x1 0 p y 1| x1 1
1 e 0 1 e 0 1
1 1
p y 0 | x1 0 p y 0 | x1 1
1 e 0 1 e 0 1
Thus we get the odds Ratio as
e 0 1
1 e 0 1
p y 1| x 1 1
p y 0 | x 1 0 1 e 0 1
OR 1 e e 1
p y 1| x 0 e 0 e 0
p y 0 | x 0 1 e 0
1
1 e 0
ˆ e 1
OR
and the log odds ratio is log OR
ˆ e 1
Thus the estimated logistic regression coefficient also provides an estimate of the odds ratio i.e.
ˆ x
ÔR e 1 . If the confidence interval for is x then the confidence interval for e is e
Logit Model for Categorical Data
Logit Model for I 2 table
Suppose there is a single explanatory factor, having I categories. In row i of the I 2 table, the
two response probabilities are 1|i and 2|i , with 1|i 2|i 1 . In the logit model,
1| i
log
2|i
i
1 .
where i describes the effects of the factor on the response.
Let nij denote the number of times response j occurs when the factor is at level i . It is usual
to treat as fixed the total counts ni ni1 ni 2 at the I factor levels. When binary responses are
independent Bernoulli ransom variables, ni1 are independent binomial random variables with
parameters 1|i .
For any set 1|i 0 , there exist i such that model 1 holds that model has as many
parameters as binomial observations, and it is said to be saturated. When a factor has no effect
on the response variables the simpler model
1| i
log
2|i
ii
holds. This is the special case of i in which 1 2 ... I . Since it is equivalent to
1|1 1|2 ... 1| I , ii is the model of statistical independence of the response and factor.
Goodness of fit as a Likelihood Ratio Test
Models for Binary Response Variables ~ 4 of 8
For a given logit model, we can use model parameter estimates to calculate predicted logits and
hence predicted probabilities and estimated expected frequencies
mˆ ij ni ˆ j |i .
When expected frequencies are relatively large we can test goodness of fit with a Pearson or
Likelihood ratio chi-squared statistic. For a model symbolized by M , we denote this statistics by
2 M and G 2 M . For instance,
nij
G2 M 2 n
i j
ij log
mˆ ij
The degrees of freedom equal the number of logits minus the number of linearly independent
parameters in the model.
We used the likelihood ratio principle to construct a statistic
2 L2 L1
That tests whether certain model parameters are zero by comparing the fitted model M1 with a
simpler model M 2 . When explanatory variables are categorical, we denote this statistic for
testing M 2 , given that M1 holds, by G 2 M 2 | M1 . Let Ls denote the maximized log-likelihood for
the saturated model. The likelihood-ratio statistic for comparing models M1 and M 2 is
G 2 M 2 | M1 2 L2 L1
2 L2 Ls 2 L2 Ls
G 2 M 2 G 2 M1
That is, the test statistic for comparing two models is identical to the difference in G 2 goodness
of fit statistics for the two models.
Model Diagnostics
Residuals
Let yi denote the number of successes for ni trials at the ith level of I settings of the
explanatory variables. For a binary response model, residuals for the fits provided by the I
binomial distributions are
yi ni ˆ1|i
ei 1
; i 1, 2, ..., I i
niˆ1|i 1 ˆ1|i
2
If ˆ1|i were replaced by the true value 1|i in i , ei would be the difference between a binomial
random variable and its expectation, divided by its estimated standard deviation, if ni were
large, ei would have an approximate standard normal distribution.
Models for Binary Response Variables ~ 5 of 8
The 1|i are unknown, however, so i replaces them by their estimates for the model. Because
the estimates depend on yi , yi niˆ1|i tend to be smaller than yi ni1|i . Thus, ei tend to
show less variation than standard normal random variables. In fact, the Pearson statistic for
testing the fit of the model is related to ei by ei .
2 2
If 2 has d.f. v , it follows that the sum of squared residuals is asymptotically comparable to the
sum of squares of v (rather than I ) standard normal random variables. Despite this, residuals
are often treated like standard normal deviates with absolute values larger than 2 indicating
possible lack of fit.
Estimation of Logistic Regression Parameters
Let xi xi 0 , xi1 , ..., xik denote the ith setting of values of k explanatory variables,
i 1, 2, ..., n where, xi 0 1 . We can express the logistic regression model as
exp X
i
1 exp X
exp X 1 1
1 i 1 1 exp X i
1 exp X 1 exp X
k
where, X j xij ; i 1, 2, ..., n ; j 1, 2, ..., k *
j 0
When more than one observation on Y occurs at a fixed xi value, it is sufficient to record the
number of observations ni and the number of “1” outcomes, thus we let Yi refers to this
“success” count rather than to individual binary responses.
Hence, Yi ~ b ni , i ; i 1, 2, ..., n are independent binomial random variables. So we can write
the probability mass function as follows
ni
f i yi i yi 1 i i i
n y
... ... ... ii
y
i
The joint probability mass function of Y1 , ..., Yn is proportional to the product of n binomial
functions, such that the Likelihood function can be written as follows
n n
n
fi yi yii i y 1 i
ni yi
L i
i 1 i 1
Taking log both side of this Likelihood function and easily obtained as follows
n n n
n
L L og L ln yii yi ln i ni yi ln 1 i
i 1 i 1 i 1
yi ln 1 exp X ni yi ln 1 exp X
n exp X n
1
L
i 1 i 1
n exp X n n
L yi ln
1 exp X
ni ln 1 exp X
yi ln 1 exp X
i 1 i 1 i 1
n exp X n
L yi ln
1 exp X i 1
ln 1 exp X ni ln 1 exp X
i 1
Models for Binary Response Variables ~ 6 of 8
n exp X n
L yi ln 1 exp X 1 exp X ni ln 1 exp X
i 1 i 1
n n
L yi ln exp X ni ln 1 exp X
i 1 i 1
n n
L yi X ni ln 1 exp X
i 1 i 1
n k n k
L yi X ij j ni ln 1 exp X ij j Using equation * iii
i 1 j 0 i 1 j 0
Differentiating likelihood equations iii with respect to elements of a and setting the results
equal to zero. Since,
k
n n
exp
j 0
X ij j X ia
L
a i 1
yi X ia ni k
0 ; a 0, 1, 2, ..., k iv
i 1
1 exp
X
j 0 ij j
k
n n
exp
j 0
X ij j
X 0
yi X ia ni k
ia
i 1 i 1
1 exp
X
j 0 ij j
n n
yi X ia ni X ia i 0 ... ... ... v
i 1 i 1
y1 X 1a y2 X 2 a ... yn X na n1 X 1a1 n2 X 2 a 2 ... nn X na n 0
X aY X a 0
X a Y 0
where, X a X1a , ..., X na ; Y y1 , ..., yn ; n1 1 , ..., nn n .
Again differentiate equation iv with respect to elements of b and obtain results as follows
k
n n
exp
j 0
X ij j
L 0
a i 1
yi X ia ni X ia k
i 1
1 exp
j 0
X ij j
k k k k
1 exp
j 0
X ij j exp
j 0
X ij j X ib exp
j 0
X ij j exp
j 0X ij j X ib
L2 n
ni X ia
a b i 1 k
2
1 exp
j 0
X ij j
2
k k
2 n
exp X ij j
exp
X ij j X ib
L j 0 j 0
a b
ni X ia X ib
i 1 k k
1 exp X
j 0 ij j
1 exp X
j 0 ij j
Models for Binary Response Variables ~ 7 of 8
L2 n
a b i 1
ni X ia X ib i i2 ; a 0, 1, ..., k ; b 0, 1, ..., k vi
L2
n1 X1a X 1b1 1 1 ... nn X na X nb n 1 n
a b
L2
X a Diag n 1 X b ... ... ... vii
a b
X a X 1a ,..., X na ; X b X 1b ,..., X nb and
n11 1 1 0 ... 0
where, 0 n2 2 1 2 ... 0
Diag n 1
... ... ...
0
0 ... n
n n 1
n
nn
We estimate the variance-covariance matrix by substituting ˆ into the matrix having elements
equal to the negative of vii and investing the estimated variance covariance matrix has the
form
1
ˆ Cov ˆ X Diag n ˆ 1 ˆ X
Var i i i
where Diag niˆi 1 ˆi denote the n n diagonal matrix having elements niˆi 1 ˆi on the
main diagonal. The square roots of estimated standard errors of model parameter estimators.
From v and vi let
t L n n
qj | t yi X ij ni X ij i
j i 1 i 1
yi ni i X ij
n
X j Y
t
i 1
2L
n
| t ni X ia X ib i 1 i
t t t
and hab
ab
i 1
X a Diag n i 1 i
t
X b
Here t , t th approximation for ˆ , is obtained from t through
k
exp
j xij
t
i
j 0
t
vii
k
1 exp
j 0
j xij
t
We use t t with formula,
q and h
1
t 1
h q
t t t
to obtain the next value t 1 , this is
1
X Diag ni i 1 i X
t 1
X y viii
t t t t
where, i t ni i t .
This is used to obtain t 1 and so forth.
Models for Binary Response Variables ~ 8 of 8