Compare Class I Fiers Part 13

The document discusses various methods for evaluating classifiers, including ROC curves, reject curves, and precision-recall curves. It covers statistical tests for estimating error rates and comparing classifiers and algorithms, as well as cost-sensitive learning techniques. Additionally, it highlights the importance of visualizing trade-offs and optimizing performance metrics like AUC and F1 measure.

Uploaded by

jesusgarcia_herrero

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views32 pages

Compare Class I Fiers Part 13

Uploaded by

jesusgarcia_herrero

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 32

Evaluation of Classifiers

ROC Curves
Reject Curves
Precision-Recall Curves
Statistical Tests
– Estimating the error rate of a classifier
– Comparing two classifiers
– Estimating the error rate of a learning
algorithm
– Comparing two algorithms
Cost-Sensitive Learning
In most applications, false positive and false
negative errors are not equally important. We
therefore want to adjust the tradeoff between
them. Many learning algorithms provide a way
to do this:
– probabilistic classifiers: combine cost matrix with
decision theory to make classification decisions
– discriminant functions: adjust the threshold for
classifying into the positive class
– ensembles: adjust the number of votes required to
classify as positive
Example: 30 decision trees
constructed by bagging
Classify as positive if K out of 30 trees
predict positive. Vary K.
Directly Visualizing the Tradeoff
We can plot the false positives versus false negatives directly. If
L(0,1) = R · L(1,0) (i.e., a FN is R times more expensive than a FP),
then the best operating point will be tangent to a line with a slope of
–R

If R=1, we should
set the threshold to
10.
If R=10, the
threshold should
be 29
Receiver Operating Characteristic
(ROC) Curve
It is traditional to plot this same information in a
normalized form with 1 – False Negative Rate
plotted against the False Positive Rate.

The optimal
operating point is
tangent to a line with
a slope of R
Generating ROC Curves
Linear Threshold Units, Sigmoid Units, Neural
Networks
– adjust the classification threshold between 0 and 1
K nearest neighbor
– adjust number of votes (between 0 and k) required to
classify positive
Naïve Bayes, Logistic Regression, etc.
– vary the probability threshold for classifying as
positive
Support vector machines
– require different margins for positive and negative
examples
SVM: Asymmetric Margins
Minimize ||w||2 + C ∑i ξi
Subject to
w · xi + ξi ≥ R (positive examples)
–w · xi + ξi ≥ 1 (negative examples)
ROC Convex Hull
If we have two classifiers h1 and h2 with (fp1,fn1)
and (fp2,fn2), then we can construct a stochastic
classifier that interpolates between them. Given
a new data point x, we use classifier h1 with
probability p and h2 with probability (1-p). The
resulting classifier has an expected false positive
level of p fp1 + (1 – p) fp2 and an expected false
negative level of p fn1 + (1 – p) fn2.
This means that we can create a classifier that
matches any point on the convex hull of the
ROC curve
ROC Convex Hull
ROC Convex
Hull

Original ROC
Curve
Maximizing AUC
At learning time, we may not know the cost ratio
R. In such cases, we can maximize the Area
Under the ROC Curve (AUC)
Efficient computation of AUC
– Assume h(x) returns a real quantity (larger values =>
class 1)
– Sort xi according to h(xi). Number the sorted points
from 1 to N such that r(i) = the rank of data point xi
– AUC = probability that a randomly chosen example
from class 1 ranks above a randomly chosen example
from class 0 = the Wilcoxon-Mann-Whitney statistic
Computing AUC
Let S1 = sum of r(i) for yi = 1 (sum of the
ranks of the positive examples)

d S1 − N1(N1 + 1)/2
AUC =
N0 N1

where N0 is the number of negative

examples and N1 is the number of positive
examples
Optimizing AUC
A hot topic in machine learning right now
is developing algorithms for optimizing
AUC
RankBoost: A modification of AdaBoost.
The main idea is to define a “ranking loss”
function and then penalize a training
example x by the number of examples of
the other class that are misranked (relative
to x)
Rejection Curves
In most learning algorithms, we can
specify a threshold for making a rejection
decision
– Probabilistic classifiers: adjust cost of
rejecting versus cost of FP and FN
– Decision-boundary method: if a test point x is
within θ of the decision boundary, then reject
Equivalent to requiring that the “activation” of the
best class is larger than the second-best class by
at least θ
Rejection Curves (2)
Vary θ and plot fraction correct versus fraction
rejected
Precision versus Recall
Information Retrieval:
– y = 1: document is relevant to query
– y = 0: document is irrelevant to query
– K: number of documents retrieved
Precision:
– fraction of the K retrieved documents (ŷ=1) that are
actually relevant (y=1)
– TP / (TP + FP)
Recall:
– fraction of all relevant documents that are retrieved
– TP / (TP + FN) = true positive rate
Precision Recall Graph
Plot recall on horizontal axis; precision on
vertical axis; and vary the threshold for making
positive predictions (or vary K)
The F1 Measure
Figure of merit that combines precision
and recall.
P ·R
F1 = 2 ·
P +R
where P = precision; R = recall. This is
twice the harmonic mean of P and R.
We can plot F1 as a function of the
classification threshold θ
Summarizing a Single Operating
Point
WEKA and many other systems normally report
various measures for a single operating point
(e.g., θ = 0.5). Here is example output from
WEKA:

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure Class

0.854 0.1 0.899 0.854 0.876 0
0.9 0.146 0.854 0.9 0.876 1
Visualizing ROC and P/R Curves in
WEKA
Right-click on the result list and choose
“Visualize Threshold Curve”. Select “1” from the
popup window.
ROC:
– Plot False Positive Rate on X axis
– Plot True Positive Rate on Y axis
– WEKA will display the AUC also
Precision/Recall:
– Plot Recall on X axis
– Plot Precision on Y axis
WEKA does not support rejection curves
Sensitivity and Selectivity
In medical testing, the terms “sensitivity”
and “selectivity” are used
– Sensitivity = TP/(TP + FN) = true positive rate
= recall
– Selectivity = TN/(FP + TN) = true negative
rate = recall for the negative class = 1 – the
false positive rate
The sensitivity versus selectivity tradeoff is
identical to the ROC curve tradeoff
Estimating the Error Rate of a
Classifier
Compute the error rate on hold-out data
– suppose a classifier makes k errors on n holdout data
points
– the estimated error rate is ê = k / n.
Compute a confidence internal on this estimate
– the standard error of this estimate is
s
²̂ · (1 − ²̂)
SE =
n
– A 1 – α confidence interval on the true error ε is
²̂ − zα/2SE <= ² <= ²̂ + zα/2SE
– For a 95% confidence interval, Z0.025 = 1.96, so we
use
²̂ − 1.96SE <= ² <= ²̂ + 1.96SE.
Comparing Two Classifiers
Goal: decide which of two classifiers h1 and h2 has lower
error rate
Method: Run them both on the same test data set and
record the following information:
– n00: the number of examples correctly classified by both
classifiers
– n01: the number of examples correctly classified by h1 but
misclassified by h2
– n10: The number of examples misclassified by h1 but correctly
classified by h2
– n00: The number of examples misclassified by both h1 and h2.

n00 n01

n10 n11
McNemar’s Test
(|n01 − n10| − 1)2
M= > χ2
1,α
n01 + n10

M is distributed approximately as χ2 with 1

degree of freedom. For a 95% confidence
test, χ21,095 = 3.84. So if M is larger than
3.84, then with 95% confidence, we can
reject the null hypothesis that the two
classifies have the same error rate
Confidence Interval on the
Difference Between Two Classifiers
Let pij = nij/n be the 2x2 contingency table
converted to probabilities
s
p01 + p10 + (p01 − p10)2
SE =
n
pA = p10 + p11
pB = p01 + p11

A 95% confidence interval on the difference in

the true error between the two classifiers is
µ ¶ µ ¶
1 1
pA−pB−1.96 SE + <= ²A−²B <= pA−pB+1.96 SE +
2n 2n
Cost-Sensitive Comparison of Two
Classifiers
Suppose we have a non-0/1 loss matrix L(ŷ,y) and we
have two classifiers h1 and h2. Goal: determine which
classifier has lower expected loss.
A method that does not work well:
– For each algorithm a and each test example (xi,yi) compute ℓa,i =
L(ha(xi),yi).
– Let δi = ℓ1,i – ℓ2,i
– Treat the δ’s as normally distributed and compute a normal
confidence interval
The problem is that there are only a finite number of
different possible values for δi. They are not normally
distributed, and the resulting confidence intervals are too
wide
A Better Method: BDeltaCost
Let ∆ = {δi}Ni=1 be the set of δi’s computed as
above
For b from 1 to 1000 do
– Let Tb be a bootstrap replicate of ∆
– Let sb = average of the δ’s in Tb
Sort the sb’s and identify the 26th and 975th
items. These form a 95% confidence interval on
the average difference between the loss from h1
and the loss from h2.
The bootstrap confidence interval quantifies the
uncertainty due to the size of the test set. It
does not allow us to compare algorithms, only
classifiers.
Estimating the Error Rate of a
Learning Algorithm
Under the PAC model, training examples x are drawn
from an underlying distribution D and labeled according
to an unknown function f to give (x,y) pairs where y =
f(x).
The error rate of a classifier h is
error(h) = PD(h(x) ≠ f(x))
Define the error rate of a learning algorithm A for sample
size m and distribution D as
error(A,m,D) = ES [error(A(S))]
This is the expected error rate of h = A(S) for training
sets S of size m drawn according to D.
We could estimate this if we had several training sets S1,
…, SL all drawn from D. We could compute A(S1), A(S2),
…, A(SL), measure their error rates, and average them.
Unfortunately, we don’t have enough data to do this!
Two Practical Methods
k-fold Cross Validation
– This provides an unbiased estimate of error(A, (1 –
1/k)m, D) for training sets of size (1 – 1/k)m
Bootstrap error estimate (out-of-bag estimate)
– Construct L bootstrap replicates of Strain
– Train A on each of them
– Evaluate on the examples that did not appear in the
bootstrap replicate
– Average the resulting error rates
Estimating the Difference Between
Two Algorithms: the 5x2CV F test
for i from 1 to 5 do
perform a 2-fold cross-validation
split S evenly and randomly into S1 and S2
for j from 1 to 2 do
(i,j)
Train algorithm A on Sj , measure error rate pA
Train algorithm B on Sj , measure error rate p(i,j)
B
(j) (i,j) (i,j)
pi := pA − pB Difference in error rates on fold j
end /* for j */
p(1)
i + p(2)
i
pi := Average difference in error rates in iteration i
µ 2 ¶ µ ¶2
2
s2i = p(1)
i − pi
(2)
+ pi − pi Variance in the difference, for iteration i
end /* for i */
P 2
p
F := Pi i 2
2 i si
5x2cv F test
p(1,1)
A p (1,1)
B p(1)
1
p1 s2
1
p(1,2)
A p (1,2)
B p(2)
1

(2,1) (2,1) (1)

pA pB p2
p2 s2
2
(2,2) (2,2) (2)
pA pB p2

(3,1) (3,1) (1)

pA pB p3
p3 s2
3
(3,2) (3,2) (2)
pA pB p3

p(4,1)
A p (4,1)
B p(1)
4
p4 s2
4
p(4,2)
A p (4,2)
B p(2)
4

p(5,1)
A p (5,1)
B p(1)
5
p5 s2
5
p(5,2)
A p (5,2)
B p(2)
5
5x2CV F test
If F > 4.47, then with 95% confidence, we
can reject the null hypothesis that
algorithms A and B have the same error
rate when trained on data sets of size m/2.
Summary
ROC Curves
Reject Curves
Precision-Recall Curves
Statistical Tests
– Estimating error rate of classifier
– Comparing two classifiers
– Estimating error rate of a learning algorithm
– Comparing two algorithms

Model Evaluation
No ratings yet
Model Evaluation
31 pages
Lecture11evaluationmetricsforclassification 240913060639 0c766554
No ratings yet
Lecture11evaluationmetricsforclassification 240913060639 0c766554
28 pages
Session01 DataScience
No ratings yet
Session01 DataScience
79 pages
Performance Parameters
No ratings yet
Performance Parameters
14 pages
Ca 3 Merged
No ratings yet
Ca 3 Merged
275 pages
4-1 Fine-Tuning Your Model
No ratings yet
4-1 Fine-Tuning Your Model
60 pages
Lecture 3 1611410001002
No ratings yet
Lecture 3 1611410001002
51 pages
PROS - Ivanna Kristianti T - Predicting Receiver Operating Characteristic - Fulltext
No ratings yet
PROS - Ivanna Kristianti T - Predicting Receiver Operating Characteristic - Fulltext
5 pages
DL IT324a 4
No ratings yet
DL IT324a 4
52 pages
IS4242 W6 Model Evaluation and Selection
No ratings yet
IS4242 W6 Model Evaluation and Selection
86 pages
Machine Learning PDF
No ratings yet
Machine Learning PDF
8 pages
Holte Slides
No ratings yet
Holte Slides
47 pages
Classification Metrics
No ratings yet
Classification Metrics
39 pages
Machine Learning Evaluation Metrics
No ratings yet
Machine Learning Evaluation Metrics
16 pages
Data Mining: Class Imbalance Solutions
No ratings yet
Data Mining: Class Imbalance Solutions
56 pages
Module 5 ML
No ratings yet
Module 5 ML
12 pages
Ai DS 2 Book-Chpt-5
No ratings yet
Ai DS 2 Book-Chpt-5
17 pages
Data M
No ratings yet
Data M
10 pages
Data M11
No ratings yet
Data M11
5 pages
SMOTE: Improving Classifier Performance
No ratings yet
SMOTE: Improving Classifier Performance
37 pages
A10 Model Performance v2 2up
No ratings yet
A10 Model Performance v2 2up
11 pages
CH-5 ML
No ratings yet
CH-5 ML
36 pages
CH 4
No ratings yet
CH 4
9 pages
Classification Metrics Mod 6
No ratings yet
Classification Metrics Mod 6
8 pages
Model Evaluation and Selection
No ratings yet
Model Evaluation and Selection
49 pages
IML 7 - ROC Curve
No ratings yet
IML 7 - ROC Curve
17 pages
Guide To AUC ROC Curve in Machine Learning
No ratings yet
Guide To AUC ROC Curve in Machine Learning
10 pages
Unit6 - 7 Issues
No ratings yet
Unit6 - 7 Issues
53 pages
L 13 Choose Your Own Algorithm D 07062024 111828am
No ratings yet
L 13 Choose Your Own Algorithm D 07062024 111828am
36 pages
Auc Roc Curve Machine Learning
No ratings yet
Auc Roc Curve Machine Learning
12 pages
Machine Learning Project Report (Group 3) Shahbaz Khan
No ratings yet
Machine Learning Project Report (Group 3) Shahbaz Khan
11 pages
FALLSEM2024-25 BCSE334L TH VL2024250101768 2024-10-08 Reference-Material-I
No ratings yet
FALLSEM2024-25 BCSE334L TH VL2024250101768 2024-10-08 Reference-Material-I
18 pages
Hands On Machine Learning 3 Edition
No ratings yet
Hands On Machine Learning 3 Edition
31 pages
19-Performance Metrics
No ratings yet
19-Performance Metrics
23 pages
Machine Learning II
No ratings yet
Machine Learning II
61 pages
Chap3 Part1 Classification
No ratings yet
Chap3 Part1 Classification
38 pages
جلسه 13
No ratings yet
جلسه 13
76 pages
Introduction to Machine Learning
No ratings yet
Introduction to Machine Learning
40 pages
4.9 Estimating The Performance of A Classifier II
No ratings yet
4.9 Estimating The Performance of A Classifier II
16 pages
Performance Measures
No ratings yet
Performance Measures
32 pages
13-Module 5 - ROC Curve Analysis - Introduction and Motivation-26-09-2023
No ratings yet
13-Module 5 - ROC Curve Analysis - Introduction and Motivation-26-09-2023
8 pages
3 - Model Evaluation & Validation
No ratings yet
3 - Model Evaluation & Validation
47 pages
Analytics in Practice: Model Evaluation
No ratings yet
Analytics in Practice: Model Evaluation
40 pages
CS 620 / DASC 600 Introduction To Data Science & Analytics: Lecture 8-Performance Evaluation
No ratings yet
CS 620 / DASC 600 Introduction To Data Science & Analytics: Lecture 8-Performance Evaluation
62 pages
Evaluation Matrix
No ratings yet
Evaluation Matrix
29 pages
9 - Session 9 - Visualizing Model Performance, Evidence and Probabilities
No ratings yet
9 - Session 9 - Visualizing Model Performance, Evidence and Probabilities
37 pages
04 - Model Selection
No ratings yet
04 - Model Selection
62 pages
Changing Classification Thresholds
No ratings yet
Changing Classification Thresholds
20 pages
Introduction To Data Mining Unit 4
No ratings yet
Introduction To Data Mining Unit 4
13 pages
Tres Hold
No ratings yet
Tres Hold
7 pages
AUC and The ROC Curve in Machine Learning - DataCamp
No ratings yet
AUC and The ROC Curve in Machine Learning - DataCamp
12 pages
MACHINELEARNING
No ratings yet
MACHINELEARNING
20 pages
TensorFlow Classification
No ratings yet
TensorFlow Classification
68 pages
CSE4261 Lecture-10
No ratings yet
CSE4261 Lecture-10
50 pages
ML 2 PPT Unit 2
No ratings yet
ML 2 PPT Unit 2
214 pages
MISY 631 Final Review Calculators Will Be Provided For The Exam
No ratings yet
MISY 631 Final Review Calculators Will Be Provided For The Exam
9 pages
ROC Curve Guide for Data Analysts
No ratings yet
ROC Curve Guide for Data Analysts
16 pages
Bi 2
No ratings yet
Bi 2
25 pages
Lesson 4.1 - Unsupervised Learning Partitioning Methods PDF
No ratings yet
Lesson 4.1 - Unsupervised Learning Partitioning Methods PDF
41 pages
4880-Article Text-13253-1-10-20120420
No ratings yet
4880-Article Text-13253-1-10-20120420
9 pages
Gradient Boosting in ML
No ratings yet
Gradient Boosting in ML
5 pages
5 Logistic Regression
No ratings yet
5 Logistic Regression
48 pages
Data Mining Practical Guide
No ratings yet
Data Mining Practical Guide
27 pages
Employee Attrition Prediction Analysis Report
No ratings yet
Employee Attrition Prediction Analysis Report
6 pages
Lab 8 Manual
No ratings yet
Lab 8 Manual
8 pages
A. Dengan Menggunakan SPSS, Masukkan Data Tersebut Dengan Menggunakan Kod-Kod Yang Sesuai
No ratings yet
A. Dengan Menggunakan SPSS, Masukkan Data Tersebut Dengan Menggunakan Kod-Kod Yang Sesuai
10 pages
ML Lecture#4
No ratings yet
ML Lecture#4
109 pages
Evaluation Machine Learning
No ratings yet
Evaluation Machine Learning
5 pages
Fraud Detection with Machine Learning
No ratings yet
Fraud Detection with Machine Learning
34 pages
WEKA: Classification: Instructor: Amany Al Luhaybi
No ratings yet
WEKA: Classification: Instructor: Amany Al Luhaybi
8 pages
Interpreting Box Plots 3
No ratings yet
Interpreting Box Plots 3
2 pages
Clustering in Machine Learning
No ratings yet
Clustering in Machine Learning
4 pages
Data Mining Analysis To Determine Employee Salaries According To Needs Based On The K-Medoids Clustering Algorithm
No ratings yet
Data Mining Analysis To Determine Employee Salaries According To Needs Based On The K-Medoids Clustering Algorithm
8 pages
Outlier Ensembles An Introduction 1st Edition Charu C. Aggarwal Instant Download
No ratings yet
Outlier Ensembles An Introduction 1st Edition Charu C. Aggarwal Instant Download
151 pages
Spam Identification On Facebook, Twitter and Email Using Machine Learning
No ratings yet
Spam Identification On Facebook, Twitter and Email Using Machine Learning
9 pages
Plotting Decision Regions - Mlxtend
No ratings yet
Plotting Decision Regions - Mlxtend
5 pages
Exam
No ratings yet
Exam
2 pages
Probit and Logit Models Stata Program and Output PDF
No ratings yet
Probit and Logit Models Stata Program and Output PDF
10 pages
Test Bank For College Algebra Graphs and Models 5th Edition by Bittinger ISBN 0321783956 9780321783950 Instant Download
No ratings yet
Test Bank For College Algebra Graphs and Models 5th Edition by Bittinger ISBN 0321783956 9780321783950 Instant Download
115 pages
Compute2
No ratings yet
Compute2
10 pages
Lecture Sheet
No ratings yet
Lecture Sheet
3 pages
MSc Machine Learning Exam
No ratings yet
MSc Machine Learning Exam
25 pages
01 - Extrema, Increase and Decrease
No ratings yet
01 - Extrema, Increase and Decrease
4 pages
Classification in Machine Learning
No ratings yet
Classification in Machine Learning
15 pages
Outlier Management Process FY20P12W4
No ratings yet
Outlier Management Process FY20P12W4
155 pages
Data Mining Unit2 3
No ratings yet
Data Mining Unit2 3
167 pages
Module5 - Outlier - Analysis: Reference: "Data Mining The Text Book", Charu C. Aggarwal, Springer, 2015. (Chapters 8)
No ratings yet
Module5 - Outlier - Analysis: Reference: "Data Mining The Text Book", Charu C. Aggarwal, Springer, 2015. (Chapters 8)
21 pages
AI28
No ratings yet
AI28
5 pages

Compare Class I Fiers Part 13

Uploaded by

Compare Class I Fiers Part 13

Uploaded by

Evaluation of Classifiers

where N0 is the number of negative

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure Class

M is distributed approximately as χ2 with 1

A 95% confidence interval on the difference in

(2,1) (2,1) (1)

(3,1) (3,1) (1)

You might also like