This Study Resource Was: Answer
This Study Resource Was: Answer
1:
a. Using the naive rule on the training set, classify a customer with the following char-
card = 1.
Compute the confusion matrix for the validation set based on the naive rule.
Perform a k-nearest neighbor classification with all predictors except zipcode using k = 1
Remember to transform categorical predictors with more than 2 categories into dummy
m
er as
variables first. Specify the “success” class as 1 (loan acceptance), and use the default
co
eH w
cutoff value of 0.5. How would the above customer be classiffied?
o.
Answer:
rs e
ou urc
“Education” variable is converted to dummy variable.
Prob.
Actual
Predicte for 1 Experienc
is
)
0 0 1 40 10 84 2 2
sh
Securities CD Credit
Education_2 Education_3 Mortgage Online
Account Account Card
1 0 0 0 0 1 1
This study source was downloaded by 100000761058697 from CourseHero.com on 09-16-2021 23:48:58 GMT -05:00
https://www.coursehero.com/file/12444953/Chapter-7-Problems-VSINGI4452/
From the output we conclude that the above customer is classified as belonging to the loan not
accepted group.
b. What is a choice of k that balances between over fitting and ignoring the predictor
information?
Answer:
% %
Error Error
Value Traini Valida
of k ng tion
m
er as
<---
co
1 0 10 Best k
eH w
2 5.83 13.75
3 6.67 11.25
o.
4 7.5
rs e 18.75
ou urc
5 6.67 12.5
6 7.5 16.25
7 10 12.5
8 9.17 12.5
o
9 8.33 11.25
aC s
vi y re
The value of k that balances between overfitting and ignoring the predictor information is 9.
ed d
c. Show the classification matrix for the validation data that result from using the best k.
ar stu
Answer:
is
Classification Confusion
Matrix
Predicted Class
This study source was downloaded by 100000761058697 from CourseHero.com on 09-16-2021 23:48:58 GMT -05:00
https://www.coursehero.com/file/12444953/Chapter-7-Problems-VSINGI4452/
Actual
Class 1 0
1 3 4
0 4 69
Error Report
# # %
Class Cases Errors Error
1 7 4 57.14
0 73 4 5.48
Overa
ll 80 8 10
m
Answer:
er as
co
eH w
Prob.
Actual
o.
Predicte for 1 Experienc
d Class rs e
(success
#Nearest Age
e
Income Family CCAvg
ou urc
Neighbors
)
0 0 1 40 10 84 2 2
o
aC s
vi y re
CD
Education_ Education_ Education_ Mortgag Securitie Onlin CreditCar
Accoun
1 2 3 e s Account e d
t
ed d
0 1 0 0 0 0 1 1
ar stu
From the output we conclude that the above customer is classified as belonging to the loan not
is
accepted group
Th
e. Repartition the data, this time into training, validation, and test sets (50%: 30%: 20%). Apply
sh
the k-NN method with the k chosen above. Compare the classification matrix of the test set with
that of the training and validation sets. Comment on the differences and their reason.
Answer:
This study source was downloaded by 100000761058697 from CourseHero.com on 09-16-2021 23:48:58 GMT -05:00
https://www.coursehero.com/file/12444953/Chapter-7-Problems-VSINGI4452/
Training Data scoring - Summary Report (for k=1)
Cut off Prob.Val. for Success
0.5
(Updatable)
Classification Confusion
Matrix
Predicted Class
Actual
1 0
Class
1 11 0
0 0 89
m
Error Report
er as
# %
Class # Cases
co
Errors Error
eH w
1 11 0 0.00
o.
0 89 0 0.00
Overall
rs e 100 0 0.00
ou urc
Validation Data scoring - Summary Report (for k=1)
Cut off Prob.Val. for Success
0.5
(Updatable)
o
aC s
vi y re
Classification Confusion
Matrix
Predicted Class
Actual
ed d
1 0
Class
ar stu
1 1 3
0 2 54
is
Error Report
Th
# %
Class # Cases
Errors Error
1 4 3 75.00
sh
0 56 2 3.57
Overall 60 5 8.33
Test Data scoring - Summary Report (for k=1)
Cut off Prob.Val. for Success
0.5
(Updatable)
Classification Confusion
This study source was downloaded by 100000761058697 from CourseHero.com on 09-16-2021 23:48:58 GMT -05:00
https://www.coursehero.com/file/12444953/Chapter-7-Problems-VSINGI4452/
Matrix
Predicted Class
Actual
1 0
Class
1 2 2
0 2 34
Error Report
# %
Class # Cases
Errors Error
1 4 2 50.00
0 36 2 5.56
Overall 40 4 10.00
We have to choose the best K which minimizes the misclassification rate in the validation set.
Our best k is 1.The percentage of classification error in the validation set is 8.33% and the test
m
er as
set is 10% is nearly same.
co
eH w
o.
rs e
ou urc
o
aC s
vi y re
ed d
ar stu
is
Th
sh
This study source was downloaded by 100000761058697 from CourseHero.com on 09-16-2021 23:48:58 GMT -05:00
https://www.coursehero.com/file/12444953/Chapter-7-Problems-VSINGI4452/
Powered by TCPDF (www.tcpdf.org)