0% found this document useful (0 votes)
72 views62 pages

Logistic Regression:: PGP Dse Bangalore July 2018

Logistic regression is used when the response variable is binary and we want to model the probability of the response being 1 based on predictor variables. It models the logit of the probability as a linear combination of the predictors. The German credit data contains information on whether applicants were good or bad credits (binary response) and the amount of their loan application (predictor). Logistic regression of the response on the predictor shows that the probability of being a good credit decreases slightly as the loan amount increases, with the odds of being approved decreasing by about 1.3% for every 1000 unit increase in the loan amount.

Uploaded by

nancy 1996
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
72 views62 pages

Logistic Regression:: PGP Dse Bangalore July 2018

Logistic regression is used when the response variable is binary and we want to model the probability of the response being 1 based on predictor variables. It models the logit of the probability as a linear combination of the predictors. The German credit data contains information on whether applicants were good or bad credits (binary response) and the amount of their loan application (predictor). Logistic regression of the response on the predictor shows that the probability of being a good credit decreases slightly as the loan amount increases, with the odds of being approved decreasing by about 1.3% for every 1000 unit increase in the loan amount.

Uploaded by

nancy 1996
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 62

Logistic Regression:

Important instrument to take a


decision

PGP DSE BANGALORE


July 2018
1
What is Logistic Regression?
Regression relates the response to a set of predictors

When response is continuous and the predictors number more


than 1, the technique is Multiple Regression

In Multiple Regression predictors may be discrete or


continuous

What happens when response is Binary:


 Among a group of loan applicant, whether a person is good
credit or bad?
 Given an income level, whether a person will buy an
iPhone or not
2
More Examples?
• Software project completion in time: Yes/No
• Marketing: Given a price point whether an item be sold
• Finance, banking: Will a stock gain? Should I give loan to the
applicant
• CRM:
• Retail:
• Healthcare:

• Elsewhere?

3
German Credit Data
Description
 Creditability: Whether a loan is Good (1) or Bad (0)
(Y: Response)

 Credit Amount: Amount asked for in loan (in DM),


Predictor

Plot(Creditability versus Credit Amount)

4
Scatterplot

What to model?
How to model?

5
Logistic Regression
• Logistic regression is a statistical method for analyzing a dataset in
which there are one or more independent variables that determine an
outcome. The outcome is a dichotomous variable (in which there are
only two possible outcomes).

• The outcome/response only contains data coded as 1 (TRUE,


success, pregnant, etc.) or 0 (FALSE, failure, non-pregnant, etc.).

• The goal of logistic regression is to find the best fitting model to


describe the relationship between the dichotomous characteristic of
interest (dependent variable = response or outcome variable) and a
set of independent (predictor or explanatory) variables.

6
Response: Probability
Binary response : Y = Success or Failure (1 or 0)

Model:
π = Pr(Y = Success | X) : linear function of X (Predictor)

Possibility 0≤π≤1
π = α + 𝛽X

Prob(Good Credit) = α + 𝛽 (Credit Amount)

Will it work?
What is an obvious drawback?
7
Response: Logit(Probability)
π
logit (π) = log 1 − π , 0<π<1
logit (π) is a continuous function on the real line

π: Success Probability

π
= Odds of success
1 −π

logit (π) = log odds of success

Logistic regression:
logit (π) = α + 𝛽X
Logit (Probability of Y = 1) is being modeled as a linear function of the
predictors

8
Response: Logit (Probability)

 Non-normal error variance


 Non-constant error variance
 No explicit error term associated with
the regression equation

9
Rationale
• Credit-worthiness depends on suggested predictors, e.g.,
amount of credit

• At different level combinations of the predictors a


randomly chosen credit applicant has different probability
of being a defaulter (i.e. being non credit-worthy)

• Hence π is a function of (X)

10
Shape of the Logistic Curve
For a single predictor
X-axis : Predictor (x)

Y-axis: π(x)

π(x), being a probability, bound


between (0, 1)

Coefficient of regression (α, 𝛽)


changes slope of the curve

11
Logit Functions

12
Credit-worthiness on Credit Amount

Does Credit-worthiness depend on Credit Amount?

To check
empirically
whether
creditworthiness
depends
on credit amount

13
Useful R Commands for Tabulation
cutpoint <- c(0, 500, 1000,1500, 2000, 2500, 5000, 7500, 10000, 15000,
20000)

Credit_cat <- with(German_Credit, cut(`Credit Amount`, cutpoint,


right=T))
table(Credit_cat)
Table1<-with(German_Credit, table(Credit_cat, Creditability))

Table2 <- prop.table(Table1,1)


Table3 <- cbind(Table2, table(Credit_cat))
round(Table3, 2)

14
Compare Creditworthiness
Creditworthy Margins of a table
Credit_cat 0 1
margin.table : Total
(0,500] 3 15
margin.table(, 1): Row margin
(500, 1000] 34 64
margin.table(, 2): Column
(1000, 1500] 51 139
margin
(1500, 2000] 33 93
(2000, 2500] 26 79
Proportions of a table
(2500, 5000] 75 200
prop.table(,1)
(5000, 7500] 34 68
prop.table(,2)
(7500, 10000] 20 26
(10000, 15000] 21 14
(15000, 20000] 3 2 15
Credit-worthiness across Amount (Cat)

Credit_cat Creditworthy
0 1 Total
(0,500] 0.17 0.83 18
(500, 1000] 0.35 0.65 98
(1000, 1500] 0.27 0.73 190
(1500, 2000] 0.26 0.74 126
(2000, 2500] 0.25 0.75 105
(2500, 5000] 0.27 0.73 275
(5000, 7500] 0.33 0.67 102
(7500, 10000] 0.43 0.57 46
(10000, 15000] 0.6 0.4 35
(15000, 20000] 0.6 0.4 5
16
Credit-worthiness across Amount (Cat)
0.85
Prop(Y=1) across Categorized Loan Amount
0.8

0.75

0.7

0.65

0.6

0.55

0.5

0.45

0.4
1 2 3 4 5 6 7 8 9 10

17
Credit-worthiness across Amount
(Cat)
Take a look
 What are the odds of given credit at various levels of
asking amount?
 How does it vary?
 How does log odds vary?

18
19
Regressing Creditworthiness on
Amount
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.229e+00 1.083e-01 11.348 < 2e-16 ***
CreditAmt -1.119e-04 2.355e-05 -4.751 2.02e-06 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 1221.7 on 999 degrees of freedom


Residual deviance: 1199.1 on 998 degrees of freedom
AIC: 1203.1

Number of Fisher Scoring iterations: 4

20
Regressing Creditworthiness on
Amount
logit(π
ො) = ෢
logit (𝑷𝒓(Y=1))
= 1.229 – 0.00012 CreditAmt

πƸ = exp(1.229 – 0.00012 Amt)ൗ[ 1+exp 1.229 – 0.00012Amt ]

21
Regressing Creditworthiness on
Amount
For every unit change in the predictor (Amount), probability of success
(probability of being credit-worthy) decreases by 0.00012 (0.012%) in the logit
scale

log(odds of success) is linear in the predictor

Convert back to original scale

Compute
Pr(credit-worthiness | Amount = 500, 600, 700)
Pr(credit-worthiness | Amount = 4000, 6000, 8000)
Pr(credit-worthiness | Amount = 15000, 20000, 25000)

What do you observe?

22
Odds Ratio
What is the odds of getting a loan if instead of applying for an
amount of 8000 you decide to apply for an amount of 6000?

Odds of getting loan = [ෝ


π/(1-ෝ
π) | amount = 8000] = 1.30
Odds of getting loan = [ෝ
π/(1-ෝ
π) | amount = 6000] = 1.66

1.27 times improvement

23
Odds Ratio
What is the odds of getting a loan if instead of applying for an
amount of 6000 you decide to apply for an amount of 4000?

Odds of getting loan = [ෝ


π/(1-ෝ
π) | amount = 6000] = 1.66
Odds of getting loan = [ෝ
π/(1-ෝ
π) | amount = 4000] = 2.11

1.27 times improvement

24
Logistic Regression Parameter (Slope)
From previous computations:

For a decrease of 2000 units in amount, odds increase by 1.27


For an increase of 2000 units in amount, odds decrease by 0.7866

෠ = exp(-0.00012) = 0.99988
exp(β)
෠ = 0.7866
exp(2000β)

log(odds) decreases by 0.00012 unit for every unit increase in loan amount
⇨ odds of being credit-worthy decreases by
exp(𝛽) = exp(0.00012) = 1.00012 times
25
Creditworthiness on Duration
Fit a logistic regression on Duration

What are the insights you get from the model?

What is the odds of loan being sanctioned when Duration = 12


months?

What is the odds of loan being sanctioned when Duration = 30


months?

What are the corresponding probabilities?

26
Multiple Logistic Regression
Call:
glm(formula = Creditability ~ ., family = binomial(link = logit),
data = German_Credit)

Deviance Residuals:
Min 1Q Median 3Q Max
-1.8249 -1.2734 0.7164 0.8533 1.5020

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.670e+00 1.466e-01 11.390 < 2e-16 ***
CreditAmt -2.300e-05 3.059e-05 -0.752 0.452
DurCredit -3.412e-02 7.282e-03 -4.685 2.8e-06 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)


27
Accuracy Measures
• True Positive TP: Correctly classified as Positive
• True Negative TN: Correctly classified as Negative

• Misclassification Probability

• Sensitivity TP Rate = TP/P = TP/(TP + FN)


• Specificity TN Rate = TN/N = TN/(TN + FP)

28
Accuracy Measures
Misclassification Probability
(Predicted)
FALSE TRUE
= (30 + 260)/1000 = 0.29

Creditability
0 40 260 Accuracy = 1 – 0.29 = 0.71

1 30 670
Sensitivity = 670/700 = 0.95
TP = 670 Specificity = 40/300 = 0.13
TN = 40

Of 300 F only 40 can be


properly predicted
Practical Implication?

29
30
31
ROC Curve
All possible combinations of sensitivity and specificity that can be
achieved by changing the cutoff value can be summarized by the area
under the ROC curve (AUC).

ROC plots 1 – Specificity (FPR) versus Sensitivity (TPR)

ROC summarizes predictive power for all possible values of p > 0.5.

The area under curve (AUC), referred to as index of accuracy or


concordance index, is a performance metric for ROC curve.

Higher the area under curve, better the prediction power of the model.

32
ROC Curve
The higher the AUC, the more accurate the test
An AUC of 1.0 means the test is 100% accurate (i.e. the curve is
square)
An AUC of 0.5 (50%) means the ROC curve is a straight diagonal
line, which represents the "ideal bad test", one which is only ever
accurate by pure chance.
When comparing two tests, the more accurate test is the one with an
ROC curve further to the top left corner of the graph, with a higher
AUC.
The best cutoff point for a test (which separates positive from negative
values) is the point on the ROC curve which is closest to the top left
corner of the graph.
The cutoff values can be selected according to whether one wants more
sensitivity or more specificity.

33
ROC & AUC: Credit Data

AUC = 62% AUC = 64%

34
ROC & AUC: Credit Data

AUC = 80%

35
Deviance
• A measure of goodness of fit of a generalized linear model.
• Higher the deviance value, poorer is the model fit.
• When the model includes only intercept term, then the
performance of the model is governed by null deviance. That is
the maximum deviance possible for a given set of data.

36
Null and Residual Deviance
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.666351 0.146615 11.365 < 2e-16 ***
DurCredit -0.037538 0.005703 -6.582 4.63e-11 ***

Null deviance: 1221.7 on 999 degrees of freedom

Residual deviance: 1177.1 on 998 degrees of freedom

With the loss of 1 Degree of Freedom residual deviance after


fitting one-predictor model is reduced by 44.6

Deviance ~ χ2 distribution

Reduction in deviance highly significant


37
Deviance for Model Comparison
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.670e+00 1.466e-01 11.390 < 2e-16 ***
CreditAmt -2.300e-05 3.059e-05 -0.752 0.452
DurCredit -3.412e-02 7.282e-03 -4.685 2.8e-06 ***

Null deviance: 1221.7 on 999 degrees of freedom

Residual deviance: 1176.6 on 997 degrees of freedom

With DurCredit only Residual deviance: 1177.1 on 998 degrees of


freedom

Inclusion of CreditAmt reduces Residual Deviance by 0.5


DF reduces by 1
2
Pr( χ (1) > 0.5) = 1 - 0.52 = 0.48 Highly non-significant
38
Deviance for Model Comparison
• Deviance comparison helps to identify parsimonious model in
a hierarchical comparison
• The more predictors are added, Residual Deviance is reduced
• If the reduction is significant, the predictor may remain in the
model
• If the reduction is not significant, the predictor need not be
included in the model

39
Deviance for Model Comparison
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.27788 0.32183 3.971 7.17e-05 ***
`Duration of Credit (month)` -0.04037 0.00586 -6.890 5.58e-12 ***
`Sex & Marital Status`2 0.13981 0.31948 0.438 0.6617
`Sex & Marital Status`3 0.68835 0.31199 2.206 0.0274 *
`Sex & Marital Status`4 0.45121 0.38021 1.187 0.2353

Null deviance: 1221.7 on 999 degrees of freedom

Residual deviance: 1162.9 on 995 degrees of freedom

Is there any need for including Sex & Marital Status?

40
Pseudo R2
• McFadden’s pseudo R2 compares log likelihood of null model
(intercept only model) and current model
• It does not have an interpretation of partitioning total variability
in the model
• Technically its value can be between 0 and 1 but practically it
does not reach 1
• Typically McFadden’s R2 between 0.2 – 0.4 may be considered
a good (acceptable) model

41
Pseudo R2
library(DescTools) ## A package for descriptive statistics
PseudoR2()

For the models considered


Predictors McFadden R2
Credit Amount 1.80%
Duration 3.60%
Credit Amount,
Duration 3.70%
Duration,
Sex/Marital Status 4.80%

42
Hosmer-Lemeshow Test
Goodness of Fit approach for model fitting: How well a model fits
depends on the difference between the model and the observed
data
Hosmer-Lemeshaw Test
library
Predictors Statistic P-value
(ResourceSelection)
Credit Amount 8.9 0.35
Duration 10.6 0.22

Credit Amount, Duration 11.14 0.19

Duration, Sex/Marital
Status 7.17 0.52

43
Model Selection
Backward Elimination:

glm(formula = Creditability ~ AcctBalance + DurCredit +


`Paymnt Status` + CreditAmt + Value + LengthEmpl + Instalment
+ SexMS + Guarantors + MVAA + ConcurrentCredits + Apt +
NoCredit + Telephone + ForeignWorker, family = binomial(link =
logit), data = German_Credit)

Do you think there is any scope for improvement?

44
Model Selection
Forward Selection:

glm(formula = Creditability ~ AcctBalance + DurCredit +


`Paymnt Status` + Value + Guarantors + Instalment + SexMS +
LengthEmpl + ConcurrentCredits + CreditAmt + ForeignWorker
+ Telephone + Apt + MVAA + NoCredit, family = binomial(link =
logit), data = German_Credit)

Do you think there is any scope for improvement?

45
Model Selection
Bothways:

glm(formula = Creditability ~ AcctBalance + DurCredit +


`Paymnt Status` + Value + Guarantors + Instalment + SexMS +
LengthEmpl + ConcurrentCredits + CreditAmt + ForeignWorker
+ Telephone + Apt + MVAA + NoCredit, family = binomial(link =
logit), data = German_Credit)

Do you think there is any scope for improvement?

46
Model Selection

What final model will you recommend?

Compare based on all the criteria considered

47
Indian Liver Patients Data

Recommend a model to understand which


Variables contribute significantly towards
Identification of a liver patient

48
• Train and test sample
• Cross-validation
• SMOTE

49
Data Split
• Data is split into 3 parts randomly
• Training Data: Several models are developed
• Validation Data: Prediction Error of Model Selection
• Test Data: Assessment of generalization of error of final model

50
Data Split

51
Data Split Proportion

• No golden standard exists


• Train proportion depends on complexity of the model
• Often data is not so large that 3-part split is possible
• Train : Test may be 70:30 / 80:20
• Train proportion should depend on model complexity

52
Cross-Validation

53
Cross-Validation
• Leave one out cross-validation (LOOCV)
 Train the model on all observations except one
 Find test error on that left-out observation
 Final error rate is the average of all n errors

 k-Fold cross-validation
 Split data into k folds
 Train the model on all folds except k-th
 Repeat on all folds and compute average error rate

54
Cross-Validation
• K-fold CV depends on the split
• In k-fold CV, we train the model on less data than what is
available. This introduces bias into the estimates of test error.
• In LOOCV, the training samples highly resemble each other.
This increases the variance of the test error estimate.
• Training error rate sometimes increases as logistic regression
does not directly minimize 0-1 error rate but maximizes
likelihood.

Rule of thumb: Choose the simplest model whose CV


error is no more than one standard error above the
model with the lowest CV error.

55
Cross-Validation
Every aspect of the learning method that involves using
the data — variable selection, for example — must be
cross-validated.

• Divide the data into k folds.


• For i = 1, . . . , k:
• Using every fold except i, perform the variable selection and fit the
model with the selected variables.
• Compute the error on fold i.
• Average the k test errors obtained.

56
57
58
59
60
61
62

You might also like