0% found this document useful (0 votes)
57 views8 pages

Understanding Binary Logistic Regression

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views8 pages

Understanding Binary Logistic Regression

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

BINARY LOGISTIC REGRESSION

Linear Regression is defined by the statement :

Yi ~ N(1 + 2 Xi2 + ... + k Xik , 2 )


or
k
Yi ~ N(  jXij , 2 ) , i = 1,2,...,n , j=1,2,...,k , Xi1 = 1 i
j=1

In BINARY LOGISTIC REGRESSION , Y assumes the values 0 and 1 , and so is a Bernoulli


random variable, the explanatory variables can be discrete or continuous but are treated as
fixed.
The basic form of Logistic regression can be derived using Bayes’ rule. Assume that k=2, so
that there is one non-trivial explanatory variable X and a constant term, then

P  Y = 0 P  X | Y=0
P Y = 0 | X =
P  Y = 0 P  X | Y=0 + P  Y = 1 P  X | Y=1
1
=
P  Y = 1 P  X | Y=1
1+
P  Y = 0 P  X | Y=0
1
=
   P  Y = 1 P  X | Y=1  
1 + exp log   
   P  Y = 0 P  X | Y=0  
1
=
   P  Y = 1   P  X | Y=1  
1 + exp log   + log   
   P  Y = 0   P  X | Y=0   
1
= , (1)
1 + exp 1 + 2 X
where
 P  Y = 1 
1 = log  
 P  Y = 0 
and
 P  X | Y=1 
2 = log  ,
 P  X | Y=0 

exp 1 + 2 X
if X is discrete. Also (1)  P  Y = 1| X  =
1 + exp 1 + 2 X

Page 1 of 8
If X is continuous (1) holds using the density f(.) instead of P. In other words,

P  Yi = 1| X i1  = F ( 1 + 2 X i1 ) ,

where

exp  x 
F(x) = .
1 + exp  x 

The conditional probability function is:

f ( y | Xi1 ) = P  Yi = y|Xi1 

= ( F ( 1 + 2 Xi1 ) ) (1 − F (1 + 2 Xi1 ) )


y 1− y

F ( 1 + 2 Xi1 ) if y=1


=
1 − F ( 1 + 2 Xi1 ) if y = 0
Thus, the logistic regression model is:

 exp 1 + 2 Xi1  


Yi |Xi1 ~ Bernoulli  
1 + exp 1 + 2 Xi1  
or
exp 1 + 2 Xi1 
i = P  Yi = 1 | Xi1  =
1 + exp 1 + 2 Xi1 
or
  
log  i  = 1 + 2 X i1
1 − i 
or
log it  i  = 1 + 2 X i1.

exp  x 
The term Logistic Regression derives from the fact that the function F(x) = is
1 + exp  x 
known as the Logistic Function.

ASSUMPTIONS
▪ The data Y1, Y2, ..., Yn are independently distributed, i.e., cases are independent.
▪ Binary logistic regression model assumes Bernoulli distribution of the response.
▪ Does NOT assume a linear relationship between the dependent variable and the independent
variables, but it does assume linear relationship between the logit of the response and the
explanatory variables; log it  i  = 1 + 2 X i1 .

Page 2 of 8
▪ Independent (explanatory) variables can be even the power terms or some other nonlinear
transformations of the original independent variables.
▪ The homogeneity of variance does NOT need to be satisfied. In fact, it is not even possible in
many cases given the model structure.
▪ Errors need to be independent but NOT normally distributed.
▪ It uses maximum likelihood estimation (MLE) rather than ordinary least squares (OLS) to
estimate the parameters, and thus relies on large-sample approximations.

For modelling , Logistic Regression is often used to estimate probabilities as a function of


explanatory variables, X. and parameters ,  . Often these probabilities are used to find odds,
odd ratios and relative risks.

ODDS AND ODDS RATIOS


The odds is the ratio of the probability that something is true divided by the probabilities that
it is not true. Thus,

P  Yi = 1| X i1 
Odd(X) = P  Yi = 0 | X i1 
= exp 1 + 2 X i1 .
The odd ratio is the ratio of two odds for different values of Xi1 , say Xi1 =x and Xi1 = x + x

Odd(x + x) exp 1 + 2 (x + x) 


=
Odd(x) exp 1 + 2 (x + x) 
= exp 2 x  ,

where x is a small change in x.


Then,

Page 3 of 8
1  Odd(x + x) - Odd(x) 
x  
lim
x →0 Odd(x) 
 exp 2 x  − 1 
= lim  
x →0
 x 
 exp 2 x  − 1 
= 2 lim  
x →0
 2 x 
d exp  u 
= 2
du u =0

= 2 exp  0
= 2 .

Thus, 2 may be interpreted as the relative change in the odds due to small change x in Xi1 :

Odd(x + x) - Odd(x) Odd(x + x)


= -1
Odd(x) Odd(x)
 2 x

If Xi1 is a Binary variable itself, Xi1 = 0 or Xi1 = 1 , then only reasonable choices for x + x
and x are 1 and 0, respectively, so that then
Odd(1)
-1
Odd(0)
Odd(1) − Odd(0)
=
Odd(0)
= exp 2  - 1.

Only if 2 is small we may use the approximation exp 2  - 1  2 . If not, one has to interpret
2 in terms of the log of the odd ratio involved:

 Odd(1) 
log   = 2 .
 Odd(0) 

Page 4 of 8
GENERALIZATION
If k  2 and X ij are independent

 P  X | Y=1  k  P  Xij | Yi =1 


log   
= log    .
 P  X | Y=0  j=2  P  Xij | Yi =0  

Setting

 P  Xij | Yi =1 
 jXij = log   
 P  Xij | Yi =0 

One can extend the model and obtain general logistic regression model

  k  
 exp 1 +   jXij  
  j= 2  
Yi |Xij ~ Bernoulli  .
 k 
1 + exp  +  X  
 1  j ij 
 
 j= 2  
Regardless of whether the Xs’ are dichotomous, polychotomous or continuous, Logistic
Regression is a way to identify the distribution of Y as a function of X and of parameter  ,
just as linear regression is a way to identify the distribution of a function of X and of parameter
(different)  .

The interpretation of the coefficients  j , j=2,3,...,k in the logistic model is given as:

(
Odd X1j , X 2j ,...,Xi-1, j , Xij + Xij , Xi+1, j , ...,X kj ) -1
(
Odd X1j , X 2j ,...,Xi-1, j , Xij , Xi+1,j , ...,X kj )
  j Xij ,

if X ij is small.

For example, j may be interpreted as the percentage change in the

( )
Odd X1j , X 2j ,...,Xi-1, j , Xij , Xi+1,j , ...,X kj due to small percentage change in X ij .

Page 5 of 8
ESTIMATION OF PARAMETERS
Let k=2. The parameters 1 and 2 are estimated using method of maximum likelihood.

The log of likelihood function L ( 1 , 2 ) is given as:

n
log L ( 1 , 2 ) =  log ( f ( yi | Xi1, 1 , 2 ) )
i =1
n n
=  yi log F(1 + 2 Xi1 ) +  (1 − yi ) log(1 − F(1 + 2Xi1))
i =1 i =1
n
F(1 + 2 Xi1 ) n
=  yi log +  log(1 − F(1 + 2Xi1))
i =1 1 − F(1 + 2 Xi1 ) i =1
n n
=  yi (1 + 2 X i1 ) -  log(1 + exp 1 + 2 Xi1 )
i =1 i =1

 log L ( 1 , 2 ) n n exp 1 + 2 Xi1 
=  yi − 
1 i =1 i =1 1 + exp 1 + 2 X i1 

and
 log L ( 1 , 2 ) n n X exp  +  X 
2
=  yi Xi1 -  1 +i1 exp 1 +  2X i1
i =1 i =1 1 2 i1
n
=  ( yi - i )Xi1
i =1

Since this is a transcendental equation, therefore it is not possible to obtain closed-form solution
of 2 . One can use Newton-Raphson can be used to obtain ̂2 :

 ˆ 1   ˆ 1(0) 
ˆ ˆ
• Guess initial value of  =   , say,  = 
(0)
 , say, ̂02 .
~  ˆ  ~  ˆ (0) 
 2  2 
• Use

Page 6 of 8
  log L ( 1 , 2 ) 
 
−1  1 ,
ˆ (t +1) = ˆ (t +1) +  −H 
~ ~   log L (  ,  ) 
 1 2

 2 
where H is Hessian Matrix given as:
  2 log L ( 1 , 2 )  2 log L ( 1 , 2 ) 
 
  21  212 
H=  2
  log L ( 1 , 2 )  log L ( 1 , 2 ) 
2

  212  21 
 

iteratively till two consecutive values of ̂ are approximately equal.


~

The estimated variance covariance matrix of ̂ is  − H  .The diagonal elements of this


−1
~

matrix gives estimated standard errors of parameters 1 and 2 .

Foe k>2, the result can be generalized.


TESTING OF HYPOTHESES
I. Testing the significance of single regression coefficient
If sample size is large , under H 0 :  j =  j0 ,

(
n ˆ j -  j0 ) ~ N(0, 1) , j = 1,2,...,k .
sˆ
j

These results can be used to test whether the coefficients  j is zero or not, j=1,3,…,k. The
null hypothesis H 0 :  j =  j0 , j = 2,…,k is of interest since this hypothesis implies that

the conditional probability P  Yi = 1| Xij  does not depend on X ij , j = 2,3,…,k. Under


H 0 :  j =  j0 ,

n ˆ j
~ N(0, 1) , j = 2,...,k .
sˆ
j

This statistic is called pseudo t-value as it is used in the same way as the t-value in linear
regression and sˆ is called the standard error of ̂ j .The test-statistic is also called Wald’s
j
statistic and the corresponding test Wald’s test.

Page 7 of 8
II. Testing the joint significance of all predictors
We are interested in testing H0 : 2 = 3 = ... = m = 0 (m  k) against the alternative
hypothesis that at least one of 2 , 3 , ... , m is not equal to zero. For this we proceed as
follows:
Re-estimate the logit model using

(
log L 0, 0 , ..., 0,ˆ m+1 , ˆ m+2 , ... , ˆ k = )
max
m+1 , m + 2 , ... , k
( log L ( 0, 0 , ..., 0,m+1 , m+2 , ... , k ) )
Then, under H0

(
 L 0, 0 , ..., 0,ˆ m+1 , ˆ m+ 2 , ... , ˆ k
LR m = -2log 
) 

 (
L ˆ 2 , ˆ 3 , ... , ˆ k ) 

 2m−1 .

This is the LIKELIHOOD RATIO test which is right-sided.


PREDICTION WITH LOGISTIC REGRESSION
From prediction point of view , logistic regression can be used for classification and the
zero and one are taken as class labels.

Suppose data of the form ( Yi , X i1 ) , i= 1,2,…,n is available and estimates of parameters

have been obtained. These estimators are consistent and asymptotically normally
distributed. The objective is to estimate conditional probability of the event such as
Yn +1 given X n +1 , 1. This is given as :

exp ˆ 1 + ˆ 2 X n+1, 1 


Est.P  Yn +1 = 1 | X n+1,1  = .
1 + exp ˆ 1 + ˆ 2 X n+1, 1 

If the above probability is greater than half , one is led to predict that Yn +1 = 1 , otherwise

Yn +1 = 1 for given X n+1, 1 .

References

Allison, P.D.(2012). Logistic Regression using SAS-Theory and Application, SAS


Institute Inc., Cary, NC, USA, 2nd ed..

Page 8 of 8

Common questions

Powered by AI

The term 'logit' in logistic regression refers to the natural logarithm of the odds of the dependent event occurring. It is modeled linearly with respect to the explanatory variables. Specifically, it is expressed as logit(p) = log(p/(1-p)) = β0 + β1X1 + ... + βnXn, where p is the probability that the dependent variable equals 1, and β coefficients represent the model parameters. The linear relationship between the logit of the response and the explanatory variables allows for predicting binary outcomes effectively .

The key assumptions of a logistic regression model include: data should be independently distributed, binary logistic regression assumes Bernoulli distribution of the response, and a linear relationship between the logit of the response and the explanatory variables (not between the dependent and independent variables). Additionally, independent variables can be nonlinear transformations, homogeneity of variance is not required, errors need to be independent but not normally distributed, and maximum likelihood estimation (MLE) is used instead of ordinary least squares (OLS).

The concept of odds is central to interpreting the results in logistic regression because it provides a scalable measure of association between an independent variable and the probability of occurrence of an event. The odds are calculated as the ratio of the probability that an event occurs to the probability it does not. In logistic regression, the exponential of the regression coefficients gives the odds ratio, which quantifies how expected odds change with a unit change in the predictor variable, allowing for understanding the relative likelihood of different outcomes .

The significance of individual regression coefficients in logistic regression is tested using Wald's test. This involves comparing the estimated coefficient to its standard error. Specifically, for large samples, under the null hypothesis H0: βj = βj0, the test statistic z = (β̂j - βj0)/SE(β̂j) follows an approximate normal distribution N(0,1). If z falls outside the critical region for a given confidence level, H0 is rejected, indicating that the coefficient is significant and contributes to the model .

The Hessian matrix plays a critical role in estimating parameters in logistic regression. It is the second derivative of the likelihood function with respect to the parameters, and it is used in the Newton-Raphson optimization method to find maximum likelihood estimates. The Hessian matrix provides curvature information that guides the adjustment of parameter estimates iteratively until convergence. Its inverse is used to estimate the variance-covariance matrix of the parameter estimates, enabling the calculation of standard errors .

Understanding the assumptions underlying logistic regression is important when interpreting model results because violations of these assumptions can lead to biased estimates, incorrect conclusions, and poor generalizability of the model. For example, the assumption of independent observations is crucial for the validity of significance tests, and the wrongly assumed distribution of the response or errors can mislead about associations. A clear grasp of underlying assumptions enables appropriate model diagnostics, adjustments, and improved predictions, ensuring robust inferences and practical applicability .

Logistic regression can be extended to handle multiple categories in the response variable through multinomial logistic regression. In this extension, the model simultaneously estimates separate regression equations for each category of the response variable, in comparison to a baseline category. This enables modeling scenarios where the response variable can take on more than two values, accommodating polychotomous outcomes. The multinomial extension provides odds ratios for each response category relative to the baseline, preserving the interpretability of the logistic model .

In logistic regression, the odds ratio is the ratio of two odds for different values of an independent variable. It can be interpreted as the relative change in the odds due to a small change in the independent variable. If the independent variable is binary, the odds ratio compares the odds of the outcome occurring when the independent variable is at one level versus another. If the odds ratio equals 1, it suggests no effect. If it is greater than 1, the likelihood of the outcome increases as the independent variable increases, and less than 1 suggests a decrease .

Maximum likelihood estimation (MLE) is preferred over ordinary least squares (OLS) in logistic regression because MLE focuses on finding the parameter values that maximize the likelihood of observing the given sample data. OLS, which works well for linear regression, relies on minimizing the sum of squared errors and assumes normal distribution of errors. In contrast, logistic regression deals with binary outcomes and assumes Bernoulli distribution, where MLE provides statistically consistent and asymptotically efficient estimates suitable for large samples .

Logistic regression models do not directly handle nonlinear relationships between the predictor and response variables; instead, they assume a linear relationship between the logit of the response and the explanatory variables. However, nonlinear relationships can be addressed by including polynomial terms or transformations of the predictors as inputs in the model, effectively capturing nonlinearities within the framework of a linear logit relationship .

You might also like