0% found this document useful (0 votes)

21 views47 pages

Holte Slides

The document discusses the evaluation of cost-sensitive classifiers, focusing on the importance of understanding false positives and false negatives in classification tasks. It critiques scalar performance measures for their inability to convey comprehensive information about classifier performance and advocates for the use of visualization techniques like ROC and cost curves to better represent classifier effectiveness under varying conditions. The conclusion emphasizes that cost curves provide a clearer understanding of average performance, operating ranges, and the significance of performance differences.

Uploaded by

Dhanush. A

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views47 pages

Holte Slides

Uploaded by

Dhanush. A

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 47

Cost-Sensitive Classifier

Evaluation
Robert Holte
Computing Science Dept.
University of Alberta
Co-author
Chris Drummond
IIT, National Research Council, Ottawa
Classifiers
• A classifier assigns an object to one of a
predefined set of categories or classes.
• Examples:
– A metal detector either sounds an alarm or
stays quiet when someone walks through.
– A credit card application is either approved or
denied.
– A medical test’s outcome is either positive or
negative.
• This talk: only two classes, “positive” and
“negative”.
Two Types of Error

False positive (“false alarm”), FP

alarm sounds but person is not carrying metal

False negative (“miss”), FN

alarm doesn’t sound but person is carrying metal
2-class Confusion Matrix
Predicted class
True class positive negative

positive (#P) #TP #P - #TP

negative (#N) #FP #N - #FP
• Reduce the 4 numbers to two rates
true positive rate = TP = (#TP)/(#P)
false positive rate = FP = (#FP)/(#N)
• Rates are independent of class ratio*

* subject to certain conditions

Example: 3 classifiers

Predicted Predicted Predicted

True pos neg True pos neg True pos neg

pos 40 60 pos 70 30 pos 60 40

neg 30 70 neg 50 50 neg 20 80

Classifier 1 Classifier 2 Classifier 3

TP = 0.4 TP = 0.7 TP = 0.6
FP = 0.3 FP = 0.5 FP = 0.2
Assumptions
• Standard Cost Model
– correct classification costs 0
– cost of misclassification depends only on the class, not
on the individual example
– over a set of examples costs are additive

• Costs or Class Distributions:

– are not known precisely at evaluation time
– may vary with time
– may depend on where the classifier is deployed

• True FP and TP do not vary with time or location,

and are accurately estimated.
How to Evaluate Performance ?
• Scalar Measures
– Accuracy
– Expected cost
– Area under the ROC curve

• Visualization Techniques
– ROC curves
– Cost Curves
What’s Wrong with Scalars ?
• A scalar does not tell the whole story.
– There are fundamentally two numbers of interest (FP and
TP), a single number invariably loses some information.
– How are errors distributed across the classes ?
– How will each classifier perform in different testing
conditions (costs or class ratios other than those
measured in the experiment) ?

• A scalar imposes a linear ordering on classifiers.

– what we want is to identify the conditions under which
each is better.
What’s Wrong with Scalars ?
• A table of scalars is just a mass of numbers.
– No immediate impact
– Poor way to present results in a paper
– Equally poor way for an experimenter to analyze results

• Some scalars (accuracy, expected cost)

require precise knowledge of costs and class
distributions.
– Often these are not known precisely and might vary
with time or location of deployment.
Why visualize performance ?
• Shape of curves more informative than a single
number
• Curve informs about
– all possible misclassification costs*
– all possible class ratios*
– under what conditions C1 outperforms C2
• Immediate impact (if done well)

* subject to certain conditions

Example: 3 classifiers

Predicted Predicted Predicted

True pos neg True pos neg True pos neg

pos 40 60 pos 70 30 pos 60 40

neg 30 70 neg 50 50 neg 20 80

Classifier 1 Classifier 2 Classifier 3

TP = 0.4 TP = 0.7 TP = 0.6
FP = 0.3 FP = 0.5 FP = 0.2
ROC plot for the 3 Classifiers
Ideal classifier always positive

chance

always negative
Dominance
Operating Range

ditto for always-positive

Slope indicates the class distributions and

misclassification costs for which the
classifier is better than always-negative
Convex Hull

Slope indicates the class distributions and

misclassification costs for which the red
classifier is the same as the blue one.
Creating an ROC Curve
• A classifier produces a single ROC point.
• If the classifier has a “sensitivity”
parameter, varying it produces a series of
ROC points (confusion matrices).
• Alternatively, if the classifier is produced
by a learning algorithm, a series of ROC
points can be generated by varying the
class ratio in the training set.
ROC Curve
What’s Wrong
with ROC Curves ?
ROC curves for two classifiers.

When to switch from C4.5 to IB1 ?

What is the performance difference ?

How to tell if two ROC curves’ difference

is statistically significant ?

When to use the default classifiers ?

ROC curves from two
cross-validation runs.

How to average them?

How to compute a confidence interval

for the average ROC curve ?
And we would like be able to
answer all these questions by
visual inspection …
Cost Curves
Cost Curves (1)
1.0 Classifier 1
TP = 0.4
FP = 0.3
0.8

Classifier 2
TP = 0.7
Error Rate

0.6
FP = 0.5

0.4
Classifier 3
TP = 0.6
FP = 0.2
0.2
FP FN = 1-TP

0.0
0.0 0.2 0.4 0.6 0.8 1.0
Probability of Positive P(+)
Cost Curves (2)
1.0
“always positive” “always negative”

0.8
Error Rate

0.6

0.4

0.2
Operating Range

0.0
0.0 0.2 0.4 0.6 0.8 1.0
Probability of Positive P(+)
Lower Envelope
1.0

0.8
Error Rate

0.6

0.4

0.2

0.0
0.0 0.2 0.4 0.6 0.8 1.0
Probability of Positive P(+)
Cost Curves
1.0
“always positive” “always negative”

0.8
Error Rate

0.6

0.4

0.2

0.0
0.0 0.2 0.4 0.6 0.8 1.0
Probability of Positive P(+)
Taking Costs Into Account
Y = FN•X + FP •(1-X)
So far, X = p(+), making Y = error rate

p(+) • C(-|+)
X=
p(+) • C(-|+) + (1-p(+)) • C(+|-)

Y = expected cost normalized to [0,1]

Comparing Cost Curves
Averaging ROC Curves
Averaging Cost Curves
Cost Curve Avg. in ROC Space
Confidence Intervals

Predicted Predicted Predicted

True pos neg True pos neg True pos neg
pos 75 25 pos 83 17
pos 78 22 neg 45 55 neg 38 62
neg 40 60

Original Resample #1 Resample #2

TP = 0.78 TP = 0.75 TP = 0.83
FP = 0.4 FP = 0.45 FP = 0.38

Resample confusion matrix 10000 times and take 95% envelope

Confidence Interval Example
Paired Resampling to Test
Statistical Significance
For the 100 test examples in the negative class:
Predicted by Predicted by Classifier2
Classifier1 pos neg

pos 30 10
neg 0 60
FP for classifier1: (30+10)/100 = 0.40
FP for classifier2: (30+0)/100 = 0.30
FP2 – FP1 = -0.10
Resample this matrix 10000 times to get (FP2-FP1) values.
Do the same for the matrix based on positive test examples.
Plot and take 95% envelope as before.
Paired Resampling to Test
Statistical Significance

classifier1

classifier2
FP2-FP1
FN2-FN1
Correlation between Classifiers
High Correlation
Predicted by Predicted by Classifier2
Classifier1 pos neg

pos 30 10
neg 0 60

Low Correlation (same FP1 and FP2 as above)

Predicted by Predicted by Classifier2
Classifier1 pos neg

pos 0 40
neg 30 30
Low correlation = Low significance

classifier1

classifier2
FP2-FP1
FN2-FN1
Limited Range of Significance
Better Data Analysis
ROC, C4.5 Splitting Criteria
Cost Curve, C4.5 Splitting Criteria
ROC, Selection procedure

Suppose this classifier was

produced by a training set
with a class ratio of 10:1,
and was used whenever the
deployment situation had a
10:1 class ratio.
Cost Curves, Selection Procedure
ROC, Many Points
Cost Curves, Many Lines
Conclusions
• Scalar performance measures should not
be used if costs and class distributions are
not exactly known or might vary with time or
location.
• Cost curves enable easy visualization of
– Average performance (expected cost)
– operating range
– confidence intervals on performance
– difference in performance and its significance
Fin
• Cost curve software is available.
Contact: [email protected]

• Thanks to
Alberta Ingenuity Centre for Machine Learning
(www.aicml.ca)

Lecture11evaluationmetricsforclassification 240913060639 0c766554
No ratings yet
Lecture11evaluationmetricsforclassification 240913060639 0c766554
28 pages
Compare Class I Fiers Part 13
No ratings yet
Compare Class I Fiers Part 13
32 pages
Lecture 3b - Evaluation
No ratings yet
Lecture 3b - Evaluation
37 pages
4.9 Estimating The Performance of A Classifier II
No ratings yet
4.9 Estimating The Performance of A Classifier II
16 pages
Guide To AUC ROC Curve in Machine Learning
No ratings yet
Guide To AUC ROC Curve in Machine Learning
10 pages
19-Performance Metrics
No ratings yet
19-Performance Metrics
23 pages
Module 5 ML
No ratings yet
Module 5 ML
12 pages
DL IT324a 4
No ratings yet
DL IT324a 4
52 pages
Ca 3 Merged
No ratings yet
Ca 3 Merged
275 pages
Auc Roc Curve Machine Learning
No ratings yet
Auc Roc Curve Machine Learning
12 pages
Model Evaluation
No ratings yet
Model Evaluation
31 pages
13-Module 5 - ROC Curve Analysis - Introduction and Motivation-26-09-2023
No ratings yet
13-Module 5 - ROC Curve Analysis - Introduction and Motivation-26-09-2023
8 pages
AUC ROC Curve for ML Enthusiasts
No ratings yet
AUC ROC Curve for ML Enthusiasts
5 pages
Optimization of The Accuracy and Calibration of Binary and Multiclass Pattern Recognizers
No ratings yet
Optimization of The Accuracy and Calibration of Binary and Multiclass Pattern Recognizers
195 pages
Performance Parameters
No ratings yet
Performance Parameters
14 pages
Introduction To ROC Analysis: Pattern Recognition Letters June 2006
No ratings yet
Introduction To ROC Analysis: Pattern Recognition Letters June 2006
16 pages
ROC Analysis for Researchers
No ratings yet
ROC Analysis for Researchers
15 pages
Data M
No ratings yet
Data M
10 pages
Classification Metrics
No ratings yet
Classification Metrics
39 pages
Binary Classification PDF
No ratings yet
Binary Classification PDF
27 pages
4-1 Fine-Tuning Your Model
No ratings yet
4-1 Fine-Tuning Your Model
60 pages
PROS - Ivanna Kristianti T - Predicting Receiver Operating Characteristic - Fulltext
No ratings yet
PROS - Ivanna Kristianti T - Predicting Receiver Operating Characteristic - Fulltext
5 pages
Data M11
No ratings yet
Data M11
5 pages
Machine Learning Evaluation Metrics
No ratings yet
Machine Learning Evaluation Metrics
16 pages
Lec5 Classification
No ratings yet
Lec5 Classification
27 pages
Introduction To ROC Analysis
No ratings yet
Introduction To ROC Analysis
15 pages
Chap3 Part1 Classification
No ratings yet
Chap3 Part1 Classification
38 pages
An Introduction To ROC Analysis
No ratings yet
An Introduction To ROC Analysis
14 pages
Bi 2
No ratings yet
Bi 2
25 pages
L 13 Choose Your Own Algorithm D 07062024 111828am
No ratings yet
L 13 Choose Your Own Algorithm D 07062024 111828am
36 pages
Ai DS 2 Book-Chpt-5
No ratings yet
Ai DS 2 Book-Chpt-5
17 pages
A10 Model Performance v2 2up
No ratings yet
A10 Model Performance v2 2up
11 pages
Classification Metrics Mod 6
No ratings yet
Classification Metrics Mod 6
8 pages
Lecture 3 1611410001002
No ratings yet
Lecture 3 1611410001002
51 pages
ROC Graphs for Researchers
No ratings yet
ROC Graphs for Researchers
38 pages
جلسه 13
No ratings yet
جلسه 13
76 pages
6 Evaluarea Performantei
No ratings yet
6 Evaluarea Performantei
43 pages
Data Mining: Class Imbalance Solutions
No ratings yet
Data Mining: Class Imbalance Solutions
56 pages
FALLSEM2024-25 BCSE334L TH VL2024250101768 2024-10-08 Reference-Material-I
No ratings yet
FALLSEM2024-25 BCSE334L TH VL2024250101768 2024-10-08 Reference-Material-I
18 pages
Statistical Modelling and Evaluation
No ratings yet
Statistical Modelling and Evaluation
15 pages
Lec09 Classifier Evaluation
No ratings yet
Lec09 Classifier Evaluation
185 pages
ROC Curve Guide for Data Analysts
No ratings yet
ROC Curve Guide for Data Analysts
16 pages
Machine Learning Project Report (Group 3) Shahbaz Khan
No ratings yet
Machine Learning Project Report (Group 3) Shahbaz Khan
11 pages
An Introduction To ROC Analysis
100% (1)
An Introduction To ROC Analysis
14 pages
Unit 2 Classification
No ratings yet
Unit 2 Classification
59 pages
Roc Intro
No ratings yet
Roc Intro
14 pages
AI351 Lecture 2 - Common Evaluation Metrics
No ratings yet
AI351 Lecture 2 - Common Evaluation Metrics
50 pages
Notes 03
No ratings yet
Notes 03
38 pages
CSE4261 Lecture-10
No ratings yet
CSE4261 Lecture-10
50 pages
Session01 DataScience
No ratings yet
Session01 DataScience
79 pages
Bioinformatics F&amp M 20100722 Bujak
100% (1)
Bioinformatics F&amp M 20100722 Bujak
27 pages
Department of Electronics and Communication Engineering
No ratings yet
Department of Electronics and Communication Engineering
27 pages
Decision Tree
No ratings yet
Decision Tree
3 pages
Combinational Logic Design
No ratings yet
Combinational Logic Design
9 pages
08 - ROC Curves and Operating Points
No ratings yet
08 - ROC Curves and Operating Points
11 pages
Brain Tumor Detection and Classification
No ratings yet
Brain Tumor Detection and Classification
14 pages
Child Study 2003
100% (1)
Child Study 2003
37 pages
Questionaire
80% (5)
Questionaire
7 pages
Optical Flow Visualization Methods
No ratings yet
Optical Flow Visualization Methods
21 pages
Senior High School Students' Level of Intelligence and Social Media Use On Academic Performance
No ratings yet
Senior High School Students' Level of Intelligence and Social Media Use On Academic Performance
4 pages
Factors Affecting The Anger of Teenagers: A Basis For Anger Management
No ratings yet
Factors Affecting The Anger of Teenagers: A Basis For Anger Management
17 pages
Understanding Rubrics: A Guide
No ratings yet
Understanding Rubrics: A Guide
10 pages
C 34367
No ratings yet
C 34367
19 pages
Punctuation Practice for Students
No ratings yet
Punctuation Practice for Students
24 pages
Budget of Lessons English 8
No ratings yet
Budget of Lessons English 8
6 pages
SNT 5 SNT PYQ Mains
No ratings yet
SNT 5 SNT PYQ Mains
2 pages
Rise Goals and Objectives
No ratings yet
Rise Goals and Objectives
2 pages
Flask Tutorial 2
0% (1)
Flask Tutorial 2
15 pages
LCPC Assessment Form 001 A Barangay
No ratings yet
LCPC Assessment Form 001 A Barangay
3 pages
Equivalent Expressions No Answers
No ratings yet
Equivalent Expressions No Answers
7 pages
Math 5
No ratings yet
Math 5
3 pages
Universidad Iberoamericana Administración de Negocios Internacionales Final Essay José Roberto Rodas Martínez
No ratings yet
Universidad Iberoamericana Administración de Negocios Internacionales Final Essay José Roberto Rodas Martínez
3 pages
Icmr-National Institute of Virology
No ratings yet
Icmr-National Institute of Virology
5 pages
History of Mudoch University, Au It Better Good One
100% (1)
History of Mudoch University, Au It Better Good One
48 pages
Aicte Mandatory Disclosure 2010-11
No ratings yet
Aicte Mandatory Disclosure 2010-11
59 pages
English Language: 8700/2 Paper 2 Writers' Viewpoints and Perspectives Mark Scheme
No ratings yet
English Language: 8700/2 Paper 2 Writers' Viewpoints and Perspectives Mark Scheme
20 pages
Cid Mag Managing Opportunities and Risk March08
No ratings yet
Cid Mag Managing Opportunities and Risk March08
12 pages
Clark 2001
No ratings yet
Clark 2001
20 pages
Topic17 Session1-5 Answer
No ratings yet
Topic17 Session1-5 Answer
8 pages
SRS Document of Flipkart: Introduction
No ratings yet
SRS Document of Flipkart: Introduction
33 pages
Official Socialist Dress
100% (1)
Official Socialist Dress
39 pages
Decision Criteria For Selecting Main Contractors in Malaysia
No ratings yet
Decision Criteria For Selecting Main Contractors in Malaysia
8 pages
Ivan Sutherland - Characterization of Ten Hidden-Surface Algorithms (1974)
No ratings yet
Ivan Sutherland - Characterization of Ten Hidden-Surface Algorithms (1974)
55 pages
Efficient Radar Pulse Compression
No ratings yet
Efficient Radar Pulse Compression
55 pages
Evolution of OB
No ratings yet
Evolution of OB
16 pages