0% found this document useful (0 votes)

19 views8 pages

Classification Metrics Mod 6

The document discusses binary classification, which involves categorizing instances into two groups based on a classification rule, and outlines key concepts such as confusion matrices, performance metrics (sensitivity, specificity, accuracy, precision, F-measure), and visualization techniques like ROC curves. It also explains class probability estimation, including probabilistic classifiers and methods for assessing probability estimates like Mean Squared Error and Brier Score. Lastly, it touches on empirical probability, which is derived from observed outcomes in sample sets.

Uploaded by

darshuipath

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views8 pages

Classification Metrics Mod 6

Uploaded by

darshuipath

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Created by Turbolearn AI

Evaluating Binary Classification

Binary classification is the task of classifying elements into two groups based on a
classification rule.

Observed response (output) 'y' has two possible values: +/-, or True/False.
Requires defining the relationship between h(x) and y.
Uses a decision rule.

Examples:

Medical test: Determining if a patient has a disease.

Fitness test: Determining if a person is fit.
Spam email classification.

Definitions
Instances: The objects of interest in machine learning.
Instance Space: The set of all possible instances. For example, the set of all
possible e-mails.
Label Space: Used in supervised learning to label examples.
Model: A mapping from the instance space to the output space.
In classification, the output space is a set of classes.
In regression, it is the set of real numbers.
To learn a model, a training set of labeled instances (x, l(x)), also called
examples, is needed.

Assessing Classification Performance

The outputs of learning algorithms must be assessed and analyzed carefully to
evaluate different learning algorithms. The performance of classifiers can be
summarized using a contingency table or confusion matrix.

1. Contingency Table or Confusion Matrix

A confusion matrix is a table that describes the performance of a
classification model on a set of test data where the true values are known.

Page 1
Created by Turbolearn AI

It summarizes prediction results on a classification problem.

It contains counts of correct and incorrect predictions, broken down by each
class.
It shows how the classification model is confused when it makes predictions.
It contains information about actual and predicted classifications.

Key terms:

True Positive (TP): The classifier correctly predicts a spam email as spam.
False Negative (FN): The classifier incorrectly predicts a spam email as non-
spam (a miss).
False Positive (FP): The classifier incorrectly predicts a non-spam email as
spam (a false alarm).
True Negative (TN): The classifier correctly predicts a non-spam email as non-
spam.

Example: Confusion matrix of email classification

Classification problem: spam and non-spam classes.

Dataset: 100 examples, 65 are spam, and 35 are non-spam.

Key Metrics Derived from the Confusion Matrix

Page 2
Created by Turbolearn AI

Sensitivity (True Positive Rate or Recall): Measure of positive examples

labeled as positive by the classifier. Should be higher.
For instance, the proportion of emails which are spam among all spam
emails.
Out of all the positive classes, how much we predicted correctly.
TP
Sensitivity =
T P +F N

Example: Sensitivity = = 69.23 (69.23% of spam emails are

45+20
45

correctly classified).
Specificity (True Negative Rate): Measure of negative examples labeled as
negative by the classifier. Should be higher.
For instance, the proportion of emails which are non-spam among all non-
spam emails.
TN
Specif icity =
T N +F P

Example: Specif icity = = 85.71 (85.71% of non-spam emails are

30+5

accurately classified).
Accuracy: Proportion of the total number of predictions that are correct.
T P +T N
Accuracy =
T P +T N +F P +F N

Example: Accuracy = = 75 (75% of examples are correctly

45+30

45+30+20+5

classified).
Precision: Ratio of correctly classified positive examples to the total number of
predicted positive examples.
Shows correctness achieved in positive prediction (out of all the positive
classes we have predicted correctly, how many are actually positive).
High precision indicates that an example labeled as positive is indeed
positive (small number of FPs).
TP
P recision =
T P +F P

Example: P recision = = 90 (90% of examples classified as spam

45+5

are actually spam).

Recall: Ratio of correctly classified positive examples to the total number of
positive examples.
Out of all the positive classes, how much we predicted correctly.
Should be as high as possible.
High recall indicates the class is correctly recognized (small number of
FNs).
F-measure (F1 score): Balances between precision and recall.

Page 3
Created by Turbolearn AI

Helps to compute recall and precision in one equation, solving the

problem of distinguishing models with low recall and high precision or
vice versa.
2⋅P recision⋅Recall
F -measure =
P recision+Recall

The last column and the last row give the marginals (i.e., column and row sums).

Visualizing Classification Performance

1. Coverage Plot
A coverage plot visualizes the four numbers (number of positives Pos, number of
negatives Neg, number of true positives TP, and number of false positives FP) using a
rectangular coordinate system and a point. In a coverage plot, classifiers with the
same accuracy are connected by line segments with slope 1.

2. ROC Curves
An ROC curve (receiver operating characteristic curve) is a graph showing
the performance of a classification model at all classification thresholds.

This curve plots two parameters:

True Positive Rate (TPR)
False Positive Rate (FPR)

An ROC curve plots TPR vs. FPR at different classification thresholds. Lowering the
classification threshold classifies more items as positive, thus increasing both False
Positives and True Positives.

Example:

Hypothetical Data:

True Labels: [1, 0, 1, 0, 1, 1, 0, 0, 1, 0]

Predicted Probabilities: [0.8, 0.3, 0.6, 0.2, 0.7, 0.9, 0.4, 0.1, 0.75, 0.55]

Case 1: Threshold = 0.5

Page 4
Created by Turbolearn AI

TP 4
TPR = = = 0.8
T P +F N 4+1
FP 0
FPR = = = 0
F P +T N 0+5

Case 2: Threshold = 0.7

TP 5
TPR = = = 1.0
T P +F N 5+0
FP 2
FPR = = = 0.4
F P +T N 2+3

Case 3: Threshold = 0.4

TP 4
TPR = = = 0.8
T P +F N 4+1
FP 4
FPR = = = 0.8
F P +T N 4+1

Case 4: Threshold = 0.2

TP 2
TPR = = = 0.4
T P +F N 2+3
FP 0
FPR = = = 0
F P +T N 0+5

Case 5: Threshold = 0.85

TP 5
TPR = = = 1.0
T P +F N 5+0
FP 4
FPR = = = 0.8
F P +T N 4+1

AUC Curve
AUC stands for "Area Under the ROC Curve." AUC measures the entire
two-dimensional area underneath the entire ROC curve from (0,0) to (1,1).

AUC ranges in value from 0 to 1.

A model whose predictions are 100% wrong has an AUC of 0.0.
One whose predictions are 100% correct has an AUC of 1.0.
AUC ROC indicates how well the probabilities from the positive classes are
separated from the negative classes.
ROC curves for different classifiers for a given dataset.
Useful to pick the right classifier based on the AUC curve with a good TP rate.

Class Probability Estimation

The probability of an event is the likelihood that the event will happen.

Page 5
Created by Turbolearn AI

Probability-based classifiers produce the class probability estimation (the

probability that a test instance belongs to the predicted class).
Involves not only predicting the class label but also obtaining a probability of
the respective label for decision-making.

Definition:

A probabilistic classifier is a classifier that is able to predict, given an

observation of an input, a probability distribution over a set of classes.

Binary (ordinary) classifier uses a function that assigns to a sample 'x' a class
label 'ŷ':

ŷ = f (x)

Probabilistic classifiers: Instead of functions, they are conditional distributions

P r = (Y /X) for a given x ∈ X , assigning probabilities to all y ∈ Y (and these

probabilities sum to one).

Examples: Naive Bayes, logistic regression, and multilayer perceptrons are

naturally probabilistic.

Assessing Class Probability Estimates

Page 6
Created by Turbolearn AI

1. Sum Squared Error (SSE): Square the individual error terms (difference
between the estimated values and the actual value), which results in a positive
number for all values.
2. Mean Squared Error (MSE): Measures the average of the squares of the errors.
The average squared difference between the estimated values and the
actual value (take the average, or the mean, of the individual squared error
terms).
3. Brier Score:
Definition of error in probability estimates, used in forecasting theory.
f - the probability that was forecast.

t - the actual outcome of the event at instance t (0 if it does not happen

and 1 if it does happen).

N is the number of forecasting instances.

In effect, it is the mean squared error of the forecast.

The Brier score is a proper scoring rule only for binary events (for
example, "rain" or "no rain").
Example: Suppose one is forecasting the probability P that it will rain on a
given day. Then the Brier score is calculated as follows:
If the forecast is 100% (P = 1) and it rains, then the Brier Score is 0
(best score).
If the forecast is 100% and it does not rain, then the Brier Score is 1
(worst score).
If the forecast is 70% (P = 0.70) and it rains, then the Brier Score is
(0.70 − 1) = 0.09.
2

If the forecast is 70% (P = 0.70) and it does not rain, then the Brier
Score is (0.70 − 0) = 0.49.
2

Empirical Probability
Empirical probability uses the number of occurrences of an outcome
within a sample set as a basis for determining the probability of that
outcome.

Page 7
Created by Turbolearn AI

The number of times "event X" happens out of 100 trials will be the probability
of event X happening.
The empirical probability of an event is the ratio of the number of outcomes in
which a specified event occurs to the total number of trials.
Empirical probability (experimental probability) estimates probabilities from
experience and observation.
Example: In a buffet, 95 out of 100 people chose to order coffee over tea. What
is the empirical probability of someone ordering tea?
Answer: The empirical probability of someone ordering tea is 5/100 = 5.

Page 8

Classification Metrics
No ratings yet
Classification Metrics
39 pages
Accuracy Precision and Recall
No ratings yet
Accuracy Precision and Recall
21 pages
Ai DS 2 Book-Chpt-5
No ratings yet
Ai DS 2 Book-Chpt-5
17 pages
Binary Classification PDF
No ratings yet
Binary Classification PDF
27 pages
ML CH 5
No ratings yet
ML CH 5
45 pages
Machine Learning Unit-2
No ratings yet
Machine Learning Unit-2
89 pages
Evaluation Matrix
No ratings yet
Evaluation Matrix
29 pages
Chap3 Part1 Classification
No ratings yet
Chap3 Part1 Classification
38 pages
Performance Metrics
No ratings yet
Performance Metrics
34 pages
19-Performance Metrics
No ratings yet
19-Performance Metrics
23 pages
Module 5 ML
No ratings yet
Module 5 ML
12 pages
DSML Clasification
No ratings yet
DSML Clasification
44 pages
L 13 Choose Your Own Algorithm D 07062024 111828am
No ratings yet
L 13 Choose Your Own Algorithm D 07062024 111828am
36 pages
Iai&ml Unit-5
No ratings yet
Iai&ml Unit-5
15 pages
ML Lecture 11 Evaluation
No ratings yet
ML Lecture 11 Evaluation
17 pages
Lec5 Classification
No ratings yet
Lec5 Classification
27 pages
ML Lec-11
No ratings yet
ML Lec-11
12 pages
Machine Learning PDF
No ratings yet
Machine Learning PDF
8 pages
جلسه 13
No ratings yet
جلسه 13
76 pages
3 - Model Evaluation & Validation
No ratings yet
3 - Model Evaluation & Validation
47 pages
ML Unit 3
No ratings yet
ML Unit 3
127 pages
Unit 2 Classification
No ratings yet
Unit 2 Classification
59 pages
Session 2 Evaluation Boosting Bagging Contemporary Business Anaytics
No ratings yet
Session 2 Evaluation Boosting Bagging Contemporary Business Anaytics
17 pages
Mlslides 2
No ratings yet
Mlslides 2
92 pages
Machine Learning Project Report (Group 3) Shahbaz Khan
No ratings yet
Machine Learning Project Report (Group 3) Shahbaz Khan
11 pages
Data Mining Final
No ratings yet
Data Mining Final
25 pages
UNIT-1-2.Binary Classification and Related Tasks
No ratings yet
UNIT-1-2.Binary Classification and Related Tasks
22 pages
Machine Learning Evaluation Metrics
No ratings yet
Machine Learning Evaluation Metrics
16 pages
Lecture11evaluationmetricsforclassification 240913060639 0c766554
No ratings yet
Lecture11evaluationmetricsforclassification 240913060639 0c766554
28 pages
Unit 4
No ratings yet
Unit 4
20 pages
06-FSSR DS610 2024 2025T1 Metrics
No ratings yet
06-FSSR DS610 2024 2025T1 Metrics
24 pages
ML Model Evaluation Metrics
No ratings yet
ML Model Evaluation Metrics
8 pages
Unit 4 Learning
No ratings yet
Unit 4 Learning
100 pages
Unit 5 Classification PDF
No ratings yet
Unit 5 Classification PDF
131 pages
Imbalance Problem
No ratings yet
Imbalance Problem
13 pages
Lec09 Classifier Evaluation
No ratings yet
Lec09 Classifier Evaluation
185 pages
Lecture 3b - Evaluation
No ratings yet
Lecture 3b - Evaluation
37 pages
Confusion Matrix
No ratings yet
Confusion Matrix
8 pages
CLASSIFICATION
No ratings yet
CLASSIFICATION
36 pages
Binary Classifier Evaluation Guide
No ratings yet
Binary Classifier Evaluation Guide
12 pages
Hands On ML Workshop-Classification
No ratings yet
Hands On ML Workshop-Classification
17 pages
Compare Class I Fiers Part 13
No ratings yet
Compare Class I Fiers Part 13
32 pages
Session01 DataScience
No ratings yet
Session01 DataScience
79 pages
03 Performance Metrics
No ratings yet
03 Performance Metrics
15 pages
Comprehensive Guide On Confusion Matrix 1657202063
No ratings yet
Comprehensive Guide On Confusion Matrix 1657202063
5 pages
Unit8 (Evaluation Method)
No ratings yet
Unit8 (Evaluation Method)
43 pages
Unit6 - 7 Issues
No ratings yet
Unit6 - 7 Issues
53 pages
9 - Session 9 - Visualizing Model Performance, Evidence and Probabilities
No ratings yet
9 - Session 9 - Visualizing Model Performance, Evidence and Probabilities
37 pages
ML Model Evaluation Metrics
No ratings yet
ML Model Evaluation Metrics
11 pages
CH 4
No ratings yet
CH 4
9 pages
CSE4261 Lecture-10
No ratings yet
CSE4261 Lecture-10
50 pages
Unit3 7 Issues
No ratings yet
Unit3 7 Issues
24 pages
Performance Parameters
No ratings yet
Performance Parameters
14 pages
WINSEM2024-25 BCSE334L TH VL2024250502042 2025-03-03 Reference-Material-I
No ratings yet
WINSEM2024-25 BCSE334L TH VL2024250502042 2025-03-03 Reference-Material-I
18 pages
Machine Learningassignment
No ratings yet
Machine Learningassignment
10 pages
ML Classification Algorithms Guide
No ratings yet
ML Classification Algorithms Guide
13 pages
Statistics and Probability Exam
100% (1)
Statistics and Probability Exam
2 pages
As Mathematics - Practice Paper - Statistics MS
No ratings yet
As Mathematics - Practice Paper - Statistics MS
10 pages
Advanced Experimental Analysis
No ratings yet
Advanced Experimental Analysis
4 pages
PSY 002 - Topic 5 - Hypothesis Testing and Interval Estimation
No ratings yet
PSY 002 - Topic 5 - Hypothesis Testing and Interval Estimation
35 pages
Ridge Regression Explained
No ratings yet
Ridge Regression Explained
6 pages
Excel Regression Analysis Template
No ratings yet
Excel Regression Analysis Template
5 pages
CRM Descriptive Analytics Guide
No ratings yet
CRM Descriptive Analytics Guide
33 pages
Quantitative Research Artifact
No ratings yet
Quantitative Research Artifact
13 pages
Employee Compensation Impact
No ratings yet
Employee Compensation Impact
9 pages
Educational Statistics EDU 408
100% (1)
Educational Statistics EDU 408
4 pages
Fundamentals of Statistical Signal Processing - Estimation Theory-Kay-2
No ratings yet
Fundamentals of Statistical Signal Processing - Estimation Theory-Kay-2
2 pages
Activity 5: Measures of Variability: I. Multiple Choice: (10 Pts. Ea.)
100% (1)
Activity 5: Measures of Variability: I. Multiple Choice: (10 Pts. Ea.)
4 pages
Panel Data Cointegration Tests Guide
No ratings yet
Panel Data Cointegration Tests Guide
17 pages
Business Statistics Notes Coec 1210
No ratings yet
Business Statistics Notes Coec 1210
44 pages
Mix Design - BMCT
No ratings yet
Mix Design - BMCT
56 pages
Business Statistics For Contemporary Decision Making 7th Edition by Ken Black Ebook and TestBank Bundle Verified PDF
No ratings yet
Business Statistics For Contemporary Decision Making 7th Edition by Ken Black Ebook and TestBank Bundle Verified PDF
410 pages
Index of Dispersion Explained
No ratings yet
Index of Dispersion Explained
20 pages
ch08 SamplingDist
No ratings yet
ch08 SamplingDist
43 pages
Chap01 Why Study Statistics
No ratings yet
Chap01 Why Study Statistics
13 pages
ADDB Week 6 For Students
No ratings yet
ADDB Week 6 For Students
59 pages
Topic 2 Exercises: - Discrete Probability Distributions (Chapter 5) - Continuous Probability Distribution (Chapter 6)
No ratings yet
Topic 2 Exercises: - Discrete Probability Distributions (Chapter 5) - Continuous Probability Distribution (Chapter 6)
24 pages
Mtahs & Statistics Mock Exam Question Paper
No ratings yet
Mtahs & Statistics Mock Exam Question Paper
23 pages
KAS402
No ratings yet
KAS402
3 pages
Final Exam Solution (Statistics)
No ratings yet
Final Exam Solution (Statistics)
7 pages
Test
No ratings yet
Test
10 pages
20-Survival Analysis
No ratings yet
20-Survival Analysis
27 pages
Comprehensive Guide to Data Visualization
No ratings yet
Comprehensive Guide to Data Visualization
34 pages
Skewness and Kurtosis - MAR
No ratings yet
Skewness and Kurtosis - MAR
11 pages
Multivariate Analysis in SPSS
No ratings yet
Multivariate Analysis in SPSS
65 pages
5 2-3 Spatial Environmental Data Gaussian Processes
No ratings yet
5 2-3 Spatial Environmental Data Gaussian Processes
5 pages

Classification Metrics Mod 6

Uploaded by

Classification Metrics Mod 6

Uploaded by

Created by Turbolearn AI

Evaluating Binary Classification

Medical test: Determining if a patient has a disease.

Assessing Classification Performance

1. Contingency Table or Confusion Matrix

It summarizes prediction results on a classification problem.

Example: Confusion matrix of email classification

Classification problem: spam and non-spam classes.

Key Metrics Derived from the Confusion Matrix

Sensitivity (True Positive Rate or Recall): Measure of positive examples

Example: Sensitivity = = 69.23 (69.23% of spam emails are

Example: Specif icity = = 85.71 (85.71% of non-spam emails are

Example: Accuracy = = 75 (75% of examples are correctly

Example: P recision = = 90 (90% of examples classified as spam

are actually spam).

Helps to compute recall and precision in one equation, solving the

Visualizing Classification Performance

This curve plots two parameters:

True Labels: [1, 0, 1, 0, 1, 1, 0, 0, 1, 0]

Case 1: Threshold = 0.5

Case 2: Threshold = 0.7

Case 3: Threshold = 0.4

Case 4: Threshold = 0.2

Case 5: Threshold = 0.85

AUC ranges in value from 0 to 1.

Class Probability Estimation

Probability-based classifiers produce the class probability estimation (the

A probabilistic classifier is a classifier that is able to predict, given an

Probabilistic classifiers: Instead of functions, they are conditional distributions

probabilities sum to one).

Examples: Naive Bayes, logistic regression, and multilayer perceptrons are

Assessing Class Probability Estimates

t - the actual outcome of the event at instance t (0 if it does not happen

and 1 if it does happen).

In effect, it is the mean squared error of the forecast.

You might also like