0% found this document useful (0 votes)
126 views5 pages

This Study Resource Was: Answer

This study examined a loan application dataset using naive Bayes classification and k-nearest neighbors (k-NN) algorithms. Using k=1, a customer with specified characteristics was classified as belonging to the "loan not accepted" group. The best k value of 9 balanced overfitting and ignoring predictor information. This k value produced a validation confusion matrix showing classification errors. When the data was split into training, validation, and test sets, the test set classification matrix differed from the training and validation matrices, likely due to overfitting on the training data.

Uploaded by

Saurabh Sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
126 views5 pages

This Study Resource Was: Answer

This study examined a loan application dataset using naive Bayes classification and k-nearest neighbors (k-NN) algorithms. Using k=1, a customer with specified characteristics was classified as belonging to the "loan not accepted" group. The best k value of 9 balanced overfitting and ignoring predictor information. This k value produced a validation confusion matrix showing classification errors. When the data was split into training, validation, and test sets, the test set classification matrix differed from the training and validation matrices, likely due to overfitting on the training data.

Uploaded by

Saurabh Sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Question 7.

1:

a. Using the naive rule on the training set, classify a customer with the following char-

acteristics: Age=40, Experience=10, Income=84, Family=2, CCAvg=2, Education_2=1,

Education_3=0, Mortgage=0, SecuritiesAccount=0, CD Account=0, Online=1 and Credit

card = 1.

Compute the confusion matrix for the validation set based on the naive rule.

Perform a k-nearest neighbor classification with all predictors except zipcode using k = 1

Remember to transform categorical predictors with more than 2 categories into dummy

m
er as
variables first. Specify the “success” class as 1 (loan acceptance), and use the default

co
eH w
cutoff value of 0.5. How would the above customer be classiffied?

o.
Answer:
rs e
ou urc
“Education” variable is converted to dummy variable.

Using the success class as 1 and default cutoff value of 0.5.


o
aC s

The customer characteristics are:


vi y re

Age=40, Experience=10, Income=84, Family=2, CCAvg=2, Education_2=1, Education_3=0,

Mortgage=0, Securities Account=0, CD Account=0, Online=1 and Credit card =1.


ed d
ar stu

Prob.
Actual
Predicte for 1 Experienc
is

#Nearest Age Income Family CCAvg


d Class (success e
Neighbors
Th

)
0 0 1 40 10 84 2 2
sh

Securities CD Credit
Education_2 Education_3 Mortgage Online
Account Account Card
1 0 0 0 0 1 1

This study source was downloaded by 100000761058697 from CourseHero.com on 09-16-2021 23:48:58 GMT -05:00

https://www.coursehero.com/file/12444953/Chapter-7-Problems-VSINGI4452/
From the output we conclude that the above customer is classified as belonging to the loan not

accepted group.

b. What is a choice of k that balances between over fitting and ignoring the predictor

information?

Answer:

Validation error log for different k:

% %
Error Error
Value Traini Valida
of k ng tion

m
er as
<---

co
1 0 10 Best k

eH w
2 5.83 13.75
3 6.67 11.25

o.
4 7.5
rs e 18.75
ou urc
5 6.67 12.5
6 7.5 16.25
7 10 12.5
8 9.17 12.5
o

9 8.33 11.25
aC s
vi y re

The value of k that balances between overfitting and ignoring the predictor information is 9.
ed d

c. Show the classification matrix for the validation data that result from using the best k.
ar stu

Answer:
is

Validation Data scoring - Summary Report (for k=1)


Th

Cut off Prob.Val. for


0.5
Success (Updatable)
sh

Classification Confusion
Matrix
Predicted Class

This study source was downloaded by 100000761058697 from CourseHero.com on 09-16-2021 23:48:58 GMT -05:00

https://www.coursehero.com/file/12444953/Chapter-7-Problems-VSINGI4452/
Actual
Class 1 0
1 3 4
0 4 69
Error Report
# # %
Class Cases Errors Error
1 7 4 57.14
0 73 4 5.48
Overa
ll 80 8 10

d. Classify the customer using the best k.

m
Answer:

er as
co
eH w
Prob.
Actual

o.
Predicte for 1 Experienc
d Class rs e
(success
#Nearest Age
e
Income Family CCAvg
ou urc
Neighbors
)
0 0 1 40 10 84 2 2
o
aC s
vi y re

CD
Education_ Education_ Education_ Mortgag Securitie Onlin CreditCar
Accoun
1 2 3 e s Account e d
t
ed d

0 1 0 0 0 0 1 1
ar stu

From the output we conclude that the above customer is classified as belonging to the loan not
is

accepted group
Th

e. Repartition the data, this time into training, validation, and test sets (50%: 30%: 20%). Apply
sh

the k-NN method with the k chosen above. Compare the classification matrix of the test set with

that of the training and validation sets. Comment on the differences and their reason.

Answer:

This study source was downloaded by 100000761058697 from CourseHero.com on 09-16-2021 23:48:58 GMT -05:00

https://www.coursehero.com/file/12444953/Chapter-7-Problems-VSINGI4452/
Training Data scoring - Summary Report (for k=1)
Cut off Prob.Val. for Success
0.5
(Updatable)

Classification Confusion
Matrix
Predicted Class
Actual
1 0
Class
1 11 0
0 0 89

m
Error Report

er as
# %
Class # Cases

co
Errors Error

eH w
1 11 0 0.00

o.
0 89 0 0.00
Overall
rs e 100 0 0.00
ou urc
Validation Data scoring - Summary Report (for k=1)
Cut off Prob.Val. for Success
0.5
(Updatable)
o
aC s
vi y re

Classification Confusion
Matrix
Predicted Class
Actual
ed d

1 0
Class
ar stu

1 1 3

0 2 54
is

Error Report
Th

# %
Class # Cases
Errors Error
1 4 3 75.00
sh

0 56 2 3.57
Overall 60 5 8.33
Test Data scoring - Summary Report (for k=1)
Cut off Prob.Val. for Success
0.5
(Updatable)
Classification Confusion

This study source was downloaded by 100000761058697 from CourseHero.com on 09-16-2021 23:48:58 GMT -05:00

https://www.coursehero.com/file/12444953/Chapter-7-Problems-VSINGI4452/
Matrix
Predicted Class
Actual
1 0
Class
1 2 2
0 2 34
Error Report
# %
Class # Cases
Errors Error
1 4 2 50.00
0 36 2 5.56
Overall 40 4 10.00
We have to choose the best K which minimizes the misclassification rate in the validation set.

Our best k is 1.The percentage of classification error in the validation set is 8.33% and the test

m
er as
set is 10% is nearly same.

co
eH w
o.
rs e
ou urc
o
aC s
vi y re
ed d
ar stu
is
Th
sh

This study source was downloaded by 100000761058697 from CourseHero.com on 09-16-2021 23:48:58 GMT -05:00

https://www.coursehero.com/file/12444953/Chapter-7-Problems-VSINGI4452/
Powered by TCPDF (www.tcpdf.org)

You might also like