0% found this document useful (0 votes)

56 views12 pages

004 07 Roc Auc Eer W4L2 W5L1 PDF

The document discusses evaluation metrics like ROC and PR curves, AUC, and EER for evaluating classifiers. It covers topics like true and false positives/negatives, precision, recall, and how these metrics are impacted by class imbalance. It also discusses using ROC curves to evaluate classifiers at different thresholds and characteristics of ROC curves like the AUC.

Uploaded by

Nermine Limeme

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

56 views12 pages

004 07 Roc Auc Eer W4L2 W5L1 PDF

Uploaded by

Nermine Limeme

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Evaluation of Classifiers

ROC, PR Curves, AUC, EER

Agha Ali Raza

CS535/EE514 – Machine Learning

Gold Labels

Gold Positive Gold Negative

Predicted 𝑡𝑝 “Precision” aka

True Positives (𝑡𝑝) False Positives (𝑓𝑝)
Predicted Positive 𝑡𝑝 + 𝑓𝑝 "Positive Predictive Value”
Labels Predicted 𝑡𝑛
False Negatives (𝑓𝑛) True Negatives (𝑡𝑛) “Negative Predictive Value”
Negative 𝑓𝑛 + 𝑡𝑛
𝑡𝑝 𝑡𝑛
𝑡𝑝 + 𝑓𝑛 𝑓𝑝 + 𝑡𝑛
“Recall” aka "Sensitivity" aka "Specificity" aka
"True Positive Rate” "True Negative Rate”
“True Acceptance Rate” “True Rejection Rate” 𝑡𝑝 + 𝑡𝑛
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 =
𝑓𝑛 𝑓𝑝 𝑡𝑝 + 𝑓𝑝 + 𝑡𝑛 + 𝑓𝑛
𝑡𝑝 + 𝑓𝑛 𝑓𝑝 + 𝑡𝑛
1 - Sensitivity = 1 - Specificity =
"False Negative Rate“ aka "False Positive Rate” aka
“False Rejection Rate" “False Acceptance Rate"

• Sensitivity, specificity, FNR and FPR are not influenced by real-world data imbalances
• These imbalances impact the denominators Actual + Actual -
• E.g. a rare disease, or a rare phenomena (like a fraud email) Test + 5 5,000
5 5
Sensitivity = 10 = 0.5, Precision = 5+5000 = 0.0009 Test - 5 5,000

• Precision and Negative Predictive Value are impacted by these imbalances and are sensitive
to them.

2
The Thresholding Problem in Classification
• Pinocchio’s nose
• Assume that liars have longer noses (wooden dummies only! ☺)
• We need to set a threshold on the nose length (input feature) above which we
classify the subject as a liar (label = yes).
• The same principle applies to classification of:
o A growth as a tumor based on size
o Blood pressure levels as hypertension
o An email as SPAM/Not-SPAM, Misinfo/Not-Misinfo based on probability scores
o A search item as match/not-match based on similarity scores, e.g.
▪ spoken term detection
▪ keyword spotting
▪ voice biometrics

• So, the question is where to place the cutoff

• Can we exhaustively try all cutoffs over a bounded score?
o Yes, but what do we track?
▪ Precision/Recall?
▪ False acceptances/False rejections?
▪ True positives/False positives?

• Say hello to RoC curves!

3
Receiver Operating Characteristic (RoC)
• A graphical plot that illustrates the diagnostic ability of a binary
classifier as its discrimination threshold is varied
• The method was originally developed for operators of military
radar receivers starting in 1941, which led to its name.
• Plot the true positive rate (TPR) – sensitivity – against the
false positive rate (FPR) – (1 - specificity) at various threshold
settings

https://en.wikipedia.org/wiki/Receiver_operating_characteristic
4
Thresholds
Height h Output Score Adult
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
(inches) (probability) (gold)
12 0.12 n y fp n tn n tn n tn n tn n tn n tn n tn n tn n tn
82 0.82 y y tp y tp y tp y tp y tp y tp y tp y tp n fn n fn
18 0.18 n y fp n tn n tn n tn n tn n tn n tn n tn n tn n tn
60 0.6 y y tp y tp y tp y tp y tp n fn n fn n fn n fn n fn
72 0.72 y y tp y tp y tp y tp y tp y tp y tp n fn n fn n fn
55 0.55 n y fp y fp y fp y fp y fp n tn n tn n tn n tn n tn
48 0.48 y y tp y tp y tp y tp n fn n fn n fn n fn n fn n fn
24 0.24 n y fp y fp n tn n tn n tn n tn n tn n tn n tn n tn
26 0.26 n y fp y fp n tn n tn n tn n tn n tn n tn n tn n tn
68 0.68 y y tp y tp y tp y tp y tp y tp n fn n fn n fn n fn
tp 5 5 5 5 4 3 2 1 0 0
fn 0 0 0 0 1 2 3 4 5 5
fp 5 3 1 1 1 0 0 0 0 0
tn 0 2 4 4 4 5 5 5 5 5
1-Specificity (FPR) 1 0.6 0.2 0.2 0.2 0 0 0 0 0
Sensitivity (Recall, TPR) 1 1 1 1 0.8 0.6 0.4 0.2 0 0
Precision 0.5 0.625 0.8333 0.8333 0.8 1 1 1 NAN NAN

5
Characteristics
• The best possible prediction method would
yield a point in the upper left corner (0,1)
i.e.100% sensitivity (no false negatives) and
100% specificity (no false positives)
• A random guess would give a point along a
diagonal line (the line of no-discrimination)
from the left bottom to the top right corners
(TPR = FPR)
• The red diagonal divides the ROC space.
• Points above the diagonal represent good
classification (better than random)
• Points below the line represent bad results (worse
than random)
• The output of a consistently bad predictor could
simply be inverted to obtain a good predictor.
• The blue diagonal is the Equal Error Diagonal
• Here FPR = FNR
o Where FNR = 1-TPR Equal Error Point
• A viable way to locate desired threshold
• Smaller is better (in the graph: higher and to
the left)
https://en.wikipedia.org/wiki/Receiver_operating_characteristic
6
More Characteristics
• It is hard to compare classifiers using ROC curves
• A way around that is to use AUC – Area under the
ROC Curve, A' (pronounced "a-prime") or "c-
statistic" ("concordance statistic").
• Larger is better
• “AUC ROC can be interpreted as the probability
that the scores given by a classifier will rank a
randomly chosen positive instance higher than a
randomly chosen negative one.” (Page 54,
Learning from Imbalanced Data Sets, 2018)
• For imbalanced datasets: “ROC analysis does not
have any bias toward models that perform well on
the minority class at the expense of the majority
class—a property that is quite attractive when
dealing with imbalanced data.” (Page 27,
Imbalanced Learning: Foundations, Algorithms,
and Applications, 2013)

Tronci, Roberto, Giorgio Giacinto, and Fabio Roli. "Dynamic score combination: A supervised and unsupervised score combination method." In International
Workshop on Machine Learning and Data Mining in Pattern Recognition, pp. 163-177. Springer, Berlin, Heidelberg, 2009.
https://www.researchgate.net/publication/225180361_Dynamic_Score_Combination_A_Supervised_and_Unsupervised_Score_Combination_Method/figures?lo=1

7
Examples
𝐴𝑈𝐶 → 1

𝐴𝑈𝐶 ≈ 0.7

𝐴𝑈𝐶 ≈ 0.5

𝐴𝑈𝐶 → 0

https://towardsdatascience.com/understanding-auc-roc-curve-68b2303cc9c5
8
Precision-Recall Curve
• The precision-recall curve shows the tradeoff
between precision and recall for different
thresholds. A high area under the curve
represents both high recall and high precision
• High scores for both show that the classifier is
returning accurate results (high precision), as
well as returning a majority of all positive
results (high recall).

https://www.vlfeat.org/overview/plots-rank.html, https://scikit-learn.org/stable/auto_examples/model_selection/plot_precision_recall.html

9
Discussion: PR vs ROC
Assuming a "positive" class 1 and a "negative" class 0. 𝑦ො is our estimate of the true class label 𝑦.
𝑷𝒓𝒆𝒄𝒊𝒔𝒊𝒐𝒏 = 𝑷(𝒚 = 𝟏|ෝ𝒚 = 𝟏)
𝑹𝒆𝒄𝒂𝒍𝒍 = 𝑺𝒆𝒏𝒔𝒊𝒕𝒊𝒗𝒊𝒕𝒚 = 𝑷(ෝ
𝒚 = 𝟏|𝒚 = 𝟏)
𝑺𝒑𝒆𝒄𝒊𝒇𝒊𝒄𝒊𝒕𝒚 = 𝑷(ෝ𝒚 = 𝟎|𝒚 = 𝟎)
• 𝑃(𝑦 = 1) is the baseline probability depending on how common the event is in the real world
• 𝑃(𝑦ො = 1) is the probability that our classifier will classify a sample as positive

• ROC curves will be the same regardless of 𝑃(𝑦 = 1)

• PR curves may be more useful in practice for needle-in-haystack type problems or problems
where the "positive" class is more interesting than the negative class.
Bottomline:
• Use precision and recall to focus on a small positive class
• When 𝑃 𝑦 = 1 is low and the ability to detect correctly positive samples is our main focus (correct
detection of negatives examples is less important to the problem).
• Use ROC when both the detection of both classes is equally important
• When we want to give equal weight to the prediction ability of both classes.
• Use ROC when the positives are the majority or switch the labels and use precision and recall
• When the positive class is larger, use the ROC metrics because the precision and recall would reflect
mostly the ability of prediction of the positive class and not the negative class which will naturally be
harder to detect due to the smaller number of samples.
• If the negative class (the minority in this case) is more important, we can switch the labels and use
precision and recall.
https://stats.stackexchange.com/questions/7207/roc-vs-precision-and-recall-curves, https://towardsdatascience.com/what-metrics-should-we-
use-on-imbalanced-data-set-precision-recall-roc-e2e79252aeba
10
A word on Biometric Systems
• The crossover error rate describes the point where
the false reject rate (FRR) and false accept rate
(FAR) are equal. CER is also known as the equal
error rate (EER). The crossover error rate describes
the overall accuracy of a biometric system.
• As the sensitivity of a biometric system increases,
FRRs will rise and FARs will drop. Conversely, as the
sensitivity is lowered, FRRs will drop and FARs will
rise.
• CER is better when lower.
• Authentication algorithms need to simultaneously This is not recall!
minimize the permeability to intruders, therefore they This is the calibration sensitivity
have to be demanding, and to maximize the comfort of the biometric device.
level, therefore to be permissive.
• This contradiction is the base for the optimization
problem in authentication algorithms and the measure
of success for the overall precision of an algorithm and
of its usability is the CER.

https://www.sciencedirect.com/topics/computer-science/crossover-error-rate
11
For more details please visit

http://aghaaliraza.com

Thank you!
12

Lecture11evaluationmetricsforclassification 240913060639 0c766554
No ratings yet
Lecture11evaluationmetricsforclassification 240913060639 0c766554
28 pages
CS340 Machine Learning ROC Curves
No ratings yet
CS340 Machine Learning ROC Curves
8 pages
Roc 1 PDF
No ratings yet
Roc 1 PDF
8 pages
ML - 03 Evaluation Metrics
No ratings yet
ML - 03 Evaluation Metrics
17 pages
Classification Metrics
No ratings yet
Classification Metrics
39 pages
Module 5 ML
No ratings yet
Module 5 ML
12 pages
CH 4
No ratings yet
CH 4
9 pages
Imbalance Problem
No ratings yet
Imbalance Problem
13 pages
PROS - Ivanna Kristianti T - Predicting Receiver Operating Characteristic - Fulltext
No ratings yet
PROS - Ivanna Kristianti T - Predicting Receiver Operating Characteristic - Fulltext
5 pages
Machine Learning Project Report (Group 3) Shahbaz Khan
No ratings yet
Machine Learning Project Report (Group 3) Shahbaz Khan
11 pages
Machine Learning Evaluation Metrics
No ratings yet
Machine Learning Evaluation Metrics
16 pages
UNIT-1-2.Binary Classification and Related Tasks
No ratings yet
UNIT-1-2.Binary Classification and Related Tasks
22 pages
CSE4261 Lecture-10
No ratings yet
CSE4261 Lecture-10
50 pages
Class Imbalance Problem: BY Dr. Anupam Ghosh 4 SEPT, 2023
No ratings yet
Class Imbalance Problem: BY Dr. Anupam Ghosh 4 SEPT, 2023
27 pages
Classification Metrics Mod 6
No ratings yet
Classification Metrics Mod 6
8 pages
جلسه 13
No ratings yet
جلسه 13
76 pages
SMOTE: Improving Classifier Performance
No ratings yet
SMOTE: Improving Classifier Performance
37 pages
Lec5 Classification
No ratings yet
Lec5 Classification
27 pages
Lecture 3 1611410001002
No ratings yet
Lecture 3 1611410001002
51 pages
Compare Class I Fiers Part 13
No ratings yet
Compare Class I Fiers Part 13
32 pages
Auc Roc Curve Machine Learning
No ratings yet
Auc Roc Curve Machine Learning
12 pages
Data M11
No ratings yet
Data M11
5 pages
Hands On Machine Learning 3 Edition
No ratings yet
Hands On Machine Learning 3 Edition
31 pages
Data M
No ratings yet
Data M
10 pages
Data Mining: Class Imbalance Solutions
No ratings yet
Data Mining: Class Imbalance Solutions
56 pages
Ai DS 2 Book-Chpt-5
No ratings yet
Ai DS 2 Book-Chpt-5
17 pages
Flach Roc Analysis
No ratings yet
Flach Roc Analysis
12 pages
Chap4 Imbalanced Classes
No ratings yet
Chap4 Imbalanced Classes
28 pages
Performance
No ratings yet
Performance
11 pages
Imbalanced Classes in Big Data
No ratings yet
Imbalanced Classes in Big Data
20 pages
Prevalence Threshold and Bounds in The Accuracy of Binary Classification Systems
No ratings yet
Prevalence Threshold and Bounds in The Accuracy of Binary Classification Systems
15 pages
Lecture 3b - Evaluation
No ratings yet
Lecture 3b - Evaluation
37 pages
Tres Hold
No ratings yet
Tres Hold
7 pages
4-1 Fine-Tuning Your Model
No ratings yet
4-1 Fine-Tuning Your Model
60 pages
Unit6 - 7 Issues
No ratings yet
Unit6 - 7 Issues
53 pages
AUC ROC Curve for ML Enthusiasts
No ratings yet
AUC ROC Curve for ML Enthusiasts
5 pages
Performance Parameters
No ratings yet
Performance Parameters
23 pages
6 Evaluarea Performantei
No ratings yet
6 Evaluarea Performantei
43 pages
Binary Classification PDF
No ratings yet
Binary Classification PDF
27 pages
Bi 2
No ratings yet
Bi 2
25 pages
Lect 02 Evaluation Part 1
No ratings yet
Lect 02 Evaluation Part 1
33 pages
Minimalist Business Slides XL by Slidesgo
No ratings yet
Minimalist Business Slides XL by Slidesgo
27 pages
FDS Notes
No ratings yet
FDS Notes
6 pages
Session01 DataScience
No ratings yet
Session01 DataScience
79 pages
DL IT324a 4
No ratings yet
DL IT324a 4
52 pages
Loss Functions & Classification Metrics
No ratings yet
Loss Functions & Classification Metrics
56 pages
Data Mining Final
No ratings yet
Data Mining Final
25 pages
Advanced ML Classification Guide
No ratings yet
Advanced ML Classification Guide
40 pages
Evaluation Metrics
No ratings yet
Evaluation Metrics
25 pages
L 13 Choose Your Own Algorithm D 07062024 111828am
No ratings yet
L 13 Choose Your Own Algorithm D 07062024 111828am
36 pages
Evaluation Matrix
No ratings yet
Evaluation Matrix
29 pages
Guide To AUC ROC Curve in Machine Learning
No ratings yet
Guide To AUC ROC Curve in Machine Learning
10 pages
Roc Curve in Python
No ratings yet
Roc Curve in Python
58 pages
06-FSSR DS610 2024 2025T1 Metrics
No ratings yet
06-FSSR DS610 2024 2025T1 Metrics
24 pages
WINSEM2024-25 CBS3006 ETH VL2024250505168 2025-01-11 Reference-Material-I
No ratings yet
WINSEM2024-25 CBS3006 ETH VL2024250505168 2025-01-11 Reference-Material-I
81 pages
A10 Model Performance v2 2up
No ratings yet
A10 Model Performance v2 2up
11 pages
Unit3 7 Issues
No ratings yet
Unit3 7 Issues
24 pages
ML CH 5
No ratings yet
ML CH 5
45 pages
Anomaly Detection: Jing Gao
No ratings yet
Anomaly Detection: Jing Gao
51 pages
Macabacus Quick Start Guide
No ratings yet
Macabacus Quick Start Guide
2 pages
Reporting Skills Booklet 1
No ratings yet
Reporting Skills Booklet 1
30 pages
Negotiation Roleplay V2
No ratings yet
Negotiation Roleplay V2
3 pages
NN
No ratings yet
NN
1 page
Markov Chains for Business Students
No ratings yet
Markov Chains for Business Students
39 pages
Heteroscedasticity & Autocorrelation
No ratings yet
Heteroscedasticity & Autocorrelation
5 pages
Employee Evaluation: Responsibilities
No ratings yet
Employee Evaluation: Responsibilities
2 pages
0000 - Eb Approach Inherent and Residual Risk File
No ratings yet
0000 - Eb Approach Inherent and Residual Risk File
10 pages
Investment Strategy
No ratings yet
Investment Strategy
13 pages
Summary For Moubarkis Online Session About K Means
No ratings yet
Summary For Moubarkis Online Session About K Means
12 pages
Density Based
No ratings yet
Density Based
52 pages
Tutorial 2 Solutions
No ratings yet
Tutorial 2 Solutions
7 pages
Production & Operations Management Tutorial
No ratings yet
Production & Operations Management Tutorial
2 pages
Revision
No ratings yet
Revision
2 pages
Project Management
No ratings yet
Project Management
3 pages
Bivariate Analysis Techniques
No ratings yet
Bivariate Analysis Techniques
3 pages
Question and Answer PCA
No ratings yet
Question and Answer PCA
4 pages
Probability Sampling - Formulas Sheet
No ratings yet
Probability Sampling - Formulas Sheet
3 pages
Aeta Isarog
No ratings yet
Aeta Isarog
2 pages
Providing Remote Users With Protected Access To A Corporate Network and Internet Using SSL VPN
No ratings yet
Providing Remote Users With Protected Access To A Corporate Network and Internet Using SSL VPN
8 pages
(All in One) The Laodicean Message
No ratings yet
(All in One) The Laodicean Message
143 pages
Fire-Important Element of Vastu Shastra: Vastu Helix, Wooden Pyramids Vastu Salts
No ratings yet
Fire-Important Element of Vastu Shastra: Vastu Helix, Wooden Pyramids Vastu Salts
1 page
Part 1 Vol 1 SWM PDF
No ratings yet
Part 1 Vol 1 SWM PDF
274 pages
Finite Diference Solutions of Seepage Problems
No ratings yet
Finite Diference Solutions of Seepage Problems
17 pages
Presentation - Durga Puja - A Celebration of Divine Power and Victory
0% (1)
Presentation - Durga Puja - A Celebration of Divine Power and Victory
9 pages
JBD Bba Morning
No ratings yet
JBD Bba Morning
7 pages
SC Election Dispute: Jurisdiction Denied
No ratings yet
SC Election Dispute: Jurisdiction Denied
10 pages
Invoice PDF
No ratings yet
Invoice PDF
1 page
Ali Et Al. - 2016 - Satellite Remote Sensing of Grasslands From Observation To Management (2) - Annotated
No ratings yet
Ali Et Al. - 2016 - Satellite Remote Sensing of Grasslands From Observation To Management (2) - Annotated
23 pages
Sure 2016 Student Information Session
No ratings yet
Sure 2016 Student Information Session
17 pages
Jurnal Neuro
No ratings yet
Jurnal Neuro
6 pages
Akapulko: Herbal Medicines Approved by DOH
No ratings yet
Akapulko: Herbal Medicines Approved by DOH
4 pages
Viral Traffic Blast Checklist
No ratings yet
Viral Traffic Blast Checklist
11 pages
Modern Dramatists PDF
No ratings yet
Modern Dramatists PDF
22 pages
Abacus Guide Book
No ratings yet
Abacus Guide Book
23 pages
SHAYEARI - DUTTAOut of Business, Text2021-06-16out of Business, R.K. Narayan
100% (1)
SHAYEARI - DUTTAOut of Business, Text2021-06-16out of Business, R.K. Narayan
4 pages
One Page PhD Proposal Guide
No ratings yet
One Page PhD Proposal Guide
1 page
2017 TVET Glossary of Terms
No ratings yet
2017 TVET Glossary of Terms
57 pages
Transonic and Supersonic Ground Effect Aerodynamics PDF
No ratings yet
Transonic and Supersonic Ground Effect Aerodynamics PDF
28 pages
CC Exercises
No ratings yet
CC Exercises
2 pages
Rules
No ratings yet
Rules
3 pages
Erlang C Table PDF
0% (1)
Erlang C Table PDF
2 pages
Egra Consolidation Blank Grade 2
No ratings yet
Egra Consolidation Blank Grade 2
3 pages
Technical Analysis PDFdrive 10
No ratings yet
Technical Analysis PDFdrive 10
17 pages
Application of Cognitive Ergonomics To The Control Room Design of Advanced Technologies
No ratings yet
Application of Cognitive Ergonomics To The Control Room Design of Advanced Technologies
40 pages
Viva Voce
No ratings yet
Viva Voce
3 pages
Newslore Contemporary Folklore On
100% (1)
Newslore Contemporary Folklore On
279 pages
Wineglass Rose
No ratings yet
Wineglass Rose
7 pages

004 07 Roc Auc Eer W4L2 W5L1 PDF

Uploaded by

004 07 Roc Auc Eer W4L2 W5L1 PDF

Uploaded by

Evaluation of Classifiers

ROC, PR Curves, AUC, EER

Agha Ali Raza

CS535/EE514 – Machine Learning

Gold Positive Gold Negative

Predicted 𝑡𝑝 “Precision” aka

• So, the question is where to place the cutoff

• Say hello to RoC curves!

• ROC curves will be the same regardless of 𝑃(𝑦 = 1)

You might also like