0% found this document useful (0 votes)

4 views15 pages

21-General approach to classification, classification by decision tree induction-17-02-2025

The document discusses performance metrics for classification models, focusing on accuracy, confusion matrix, precision, recall, specificity, and F1 score. It explains when to use accuracy and when it can be misleading, particularly in imbalanced datasets, and emphasizes the importance of understanding false positives and false negatives in various contexts. Additionally, it introduces the confusion matrix as a foundational tool for evaluating model performance and provides Python code for generating confusion matrices using scikit-learn.

Uploaded by

jee2022.acc

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views15 pages

21-General approach to classification, classification by decision tree induction-17-02-2025

Uploaded by

jee2022.acc

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 15

Performance metrics

Accuracy:
Accuracy in classification problems is the number of
correct predictions made by the model over all kinds
predictions made.

Accuracy

In the Numerator, are our correct predictions (True

positives and True Negatives)(Marked as red in the fig
above) and in the denominator, are the kind of all
predictions made by the algorithm(Right as well as
wrong ones).

When to use Accuracy:

Accuracy is a good measure when the target variable

classes in the data are nearly balanced.

Ex:60% classes in our fruits images data are apple and

40% are oranges.
A model which predicts whether a new image is Apple
or an Orange, 97% of times correctly is a very good
measure in this example.

When NOT to use Accuracy:

Accuracy should NEVER be used as a measure when

the target variable classes in the data are a majority of
one class.

Ex: In our cancer detection example with 100 people,

only 5 people has cancer. Let’s say our model is very
bad and predicts every case as No Cancer. In doing so,
it has classified those 95 non-cancer patients correctly
and 5 cancerous patients as Non-cancerous. Now even
though the model is terrible at predicting cancer, The
accuracy of such a bad model is also 95%.

2. Confusion Matrix
The Confusion matrix is one of the most native and
easiest metrics used for finding the correctness and
accuracy of the model. It is used for Classification
problem where the output can be of two or more types
of classes.
Before diving into what the confusion matrix is all
about and what it conveys, Let’s say we are solving a
classification problem where we are predicting
whether a person is having cancer or not.

Let’s give a label of to our target variable:

1: When a person is having cancer

0: When a person is NOT having cancer.

Alright! Now that we have identified the problem, the

confusion matrix, is a table with two dimensions
(“Actual” and “Predicted”), and sets of “classes” in
both dimensions. Our Actual classifications are
columns and Predicted ones are Rows.
Confusion Matrix

The Confusion matrix in itself is not a performance

measure as such, but almost all of the performance
metrics are based on Confusion Matrix and the
numbers inside it.

Terms associated with

Confusion matrix:
1. True Positives (TP): True positives are the cases
when the actual class of the data point was 1(True)
and the predicted is also 1(True)

Ex: The case where a person is actually having

cancer(1) and the model classifying his case as
cancer(1) comes under True positive.

2. True Negatives (TN): True negatives are the cases

when the actual class of the data point was 0(False)
and the predicted is also 0(False)
Ex: The case where a person NOT having cancer and
the model classifying his case as Not cancer comes
under True Negatives.

3. False Positives (FP): False positives are the cases

when the actual class of the data point was 0(False)
and the predicted is 1(True). False is because the
model has predicted incorrectly and positive because
the class predicted was a positive one. (1)

Ex: A person NOT having cancer and the model

classifying his case as cancer comes under False
Positives.

4. False Negatives (FN): False negatives are the

cases when the actual class of the data point was
1(True) and the predicted is 0(False). False is because
the model has predicted incorrectly and negative
because the class predicted was a negative one. (0)

Ex: A person having cancer and the model classifying

his case as No-cancer comes under False Negatives.

The ideal scenario that we all want is that the model

should give 0 False Positives and 0 False Negatives.
But that’s not the case in real life as any model will
NOT be 100% accurate most of the times.

When to minimise what?

We know that there will be some error associated with

every model that we use for predicting the true class of
the target variable. This will result in False Positives
and False Negatives(i.e Model classifying things
incorrectly as compared to the actual class).
There’s no hard rule that says what should be
minimised in all the situations. It purely depends on
the business needs and the context of the problem you
are trying to solve. Based on that, we might want to
minimise either False Positives or False negatives.

1. Minimising False Negatives:

Let’s say in our cancer detection problem example, out

of 100 people, only 5 people have cancer. In this case,
we want to correctly classify all the cancerous patients
as even a very BAD model(Predicting everyone as
NON-Cancerous) will give us a 95% accuracy(will come
to what accuracy is). But, in order to capture all cancer
cases, we might end up making a classification when
the person actually NOT having cancer is classified as
Cancerous. This might be okay as it is less dangerous
than NOT identifying/capturing a cancerous patient
since we will anyway send the cancer cases for further
examination and reports. But missing a cancer patient
will be a huge mistake as no further examination will
be done on them.

2. Minimising False Positives:

For better understanding of False Positives, let’s use a

different example where the model classifies whether
an email is spam or not

Let’s say that you are expecting an important email

like hearing back from a recruiter or awaiting an admit
letter from a university. Let’s assign a label to the
target variable and say,1: “Email is a spam”
and 0:”Email is not a spam”
Suppose the Model classifies that important email that
you are desperately waiting for, as Spam(case of False
positive). Now, in this situation, this is pretty bad than
classifying a spam email as important or not spam
since in that case, we can still go ahead and manually
delete it and it’s not a pain if it happens once a while.
So in case of Spam email classification, minimising
False positives is more important than False Negatives.

3. Precision:
Let’s use the same confusion matrix as the one we used
before for our cancer detection example.

Precision
Precision is a measure that tells us what proportion of
patients that we diagnosed as having cancer, actually
had cancer. The predicted positives (People predicted
as cancerous are TP and FP) and the people actually
having a cancer are TP.

Ex: In our cancer example with 100 people, only 5

people have cancer. Let’s say our model is very bad
and predicts every case as Cancer. Since we are
predicting everyone as having cancer, our
denominator(True positives and False Positives) is 100
and the numerator, person having cancer and the
model predicting his case as cancer is 5. So in this
example, we can say that Precision of such model is
5%.

4. Recall or Sensitivity:
Recall or Sensitivity

Recall is a measure that tells us what proportion of

patients that actually had cancer was diagnosed by the
algorithm as having cancer. The actual positives
(People having cancer are TP and FN) and the people
diagnosed by the model having a cancer are TP. (Note:
FN is included because the Person actually had a
cancer even though the model predicted otherwise).

Ex: In our cancer example with 100 people, 5 people

actually have cancer. Let’s say that the model predicts
every case as cancer.

So our denominator(True positives and False

Negatives) is 5 and the numerator, person having
cancer and the model predicting his case as cancer is
also 5(Since we predicted 5 cancer cases correctly). So
in this example, we can say that the Recall of such
model is 100%. And Precision of such a model(As we
saw above) is 5%

When to use Precision and When to use Recall?:

It is clear that recall gives us information about a

classifier’s performance with respect to false negatives
(how many did we miss), while precision gives us
information about its performance with respect to false
positives(how many did we caught).

Precision is about being precise. So even if we

managed to capture only one cancer case, and we
captured it correctly, then we are 100% precise.
Recall is not so much about capturing cases correctly
but more about capturing all cases that have “cancer”
with the answer as “cancer”. So if we simply always
say every case as “cancer”, we have 100% recall.

So basically if we want to focus more on minimising

False Negatives, we would want our Recall to be as
close to 100% as possible without precision being too
bad and if we want to focus on minimising False
positives, then our focus should be to make Precision
as close to 100% as possible.

5. Specificity:

Specificity is a measure that tells us what proportion of

patients that did NOT have cancer, were predicted by
the model as non-cancerous. The actual negatives
(People actually NOT having cancer are FP and TN)
and the people diagnosed by us not having cancer are
TN. (Note: FP is included because the Person did NOT
actually have cancer even though the model predicted
otherwise).

Specificity is the exact opposite of Recall.

Ex: In our cancer example with 100 people, 5 people

actually have cancer. Let’s say that the model predicts
every case as cancer.

So our denominator(False positives and True

Negatives) is 95 and the numerator, person not having
cancer and the model predicting his case as no cancer
is 0 (Since we predicted every case as cancer). So in
this example, we can that Specificity of such model is
0%.

6. F1 Score:
We don’t really want to carry both Precision and Recall
in our pockets every time we make a model for solving
a classification problem. So it’s best if we can get a
single score that kind of represents both Precision(P)
and Recall(R).

One way to do that is simply taking their arithmetic

mean. i.e (P + R) / 2 where P is Precision and R is
Recall. But that’s pretty bad in some situations.

Suppose we have 100 credit card transactions, of

which 97 are legit and 3 are fraud and let’s say we
came up a model that predicts everything as fraud.
(Horrendous right!?)

Precision and Recall for the example is shown in the fig

below.
Precision and Recall for Credit Card example

Now, if we simply take arithmetic mean of both, then it

comes out to be nearly 51%. We shouldn’t be giving
such a moderate score to a terrible model since it’s just
predicting every transaction as fraud.

So, we need something more balanced than the

arithmetic mean and that is harmonic mean.
Arithmetic Mean vs Harmonic Mean

The Harmonic mean is given by the formula shown in

the figure on the above.

Harmonic mean is kind of an average when x and y are

equal. But when x and y are different, then it’s closer
to the smaller number as compared to the larger
number.

For our previous example, F1 Score = Harmonic

Mean(Precision, Recall)

F1 Score = 2 * Precision * Recall / (Precision + Recall)

= 2*3*100/103 = 5%

So if one number is really small between precision and

recall, the F1 Score kind of raises a flag and is more
closer to the smaller number than the bigger one,
giving the model an appropriate score rather than just
an arithmetic mean.

Confusion Matrix using scikit-learn in Python

# confusion matrix in sklearn

from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report

# actual values
actual = [1,0,0,1,0,0,1,0,0,1]
# predicted values
predicted = [1,0,0,1,0,0,0,1,0,0]

# confusion matrix
matrix = confusion_matrix(actual,predicted,
labels=[1,0])
print('Confusion matrix : \n',matrix)

# outcome values order in sklearn

tp, fn, fp, tn =
confusion_matrix(actual,predicted,labels=[1,0]).resha
pe(-1)
print('Outcome values : \n', tp, fn, fp, tn)

# classification report for precision, recall f1-

score and accuracy
matrix =
classification_report(actual,predicted,labels=[1,0])

print('Classification report : \n',matrix)

The Stage Manager Toolkit
No ratings yet
The Stage Manager Toolkit
238 pages
Learning Best Practices For Model Evaluation and Hyper-Parameter Tuning
No ratings yet
Learning Best Practices For Model Evaluation and Hyper-Parameter Tuning
20 pages
Performance Metrics (Classification) : Enrique J. de La Hoz D
100% (1)
Performance Metrics (Classification) : Enrique J. de La Hoz D
30 pages
Ch01_ICS422_03
No ratings yet
Ch01_ICS422_03
46 pages
Lesson 4 - Performance Metrics
No ratings yet
Lesson 4 - Performance Metrics
46 pages
Confusion Matrix
No ratings yet
Confusion Matrix
43 pages
Chapter 7 - LAST
No ratings yet
Chapter 7 - LAST
29 pages
Wa0013.
No ratings yet
Wa0013.
9 pages
UNIT-3
No ratings yet
UNIT-3
13 pages
9__ROC__AUC
No ratings yet
9__ROC__AUC
27 pages
Confusion Matrix
No ratings yet
Confusion Matrix
7 pages
Confusion Matrix: A Confusion Matrix Is A Summary of Prediction Results On A Classification Problem
No ratings yet
Confusion Matrix: A Confusion Matrix Is A Summary of Prediction Results On A Classification Problem
13 pages
Confusion Matrix
No ratings yet
Confusion Matrix
13 pages
Classification Metrics in Machine Learning
No ratings yet
Classification Metrics in Machine Learning
6 pages
11.2 - Classification Evaluation Metrics
No ratings yet
11.2 - Classification Evaluation Metrics
22 pages
Module 2
No ratings yet
Module 2
72 pages
Accuracy, Recall, Precision, F-Score & Specificity, Which To Optimize On
No ratings yet
Accuracy, Recall, Precision, F-Score & Specificity, Which To Optimize On
10 pages
Accuracy, Precision, Recall & F1 Score Interpretation of Performance Measures
No ratings yet
Accuracy, Precision, Recall & F1 Score Interpretation of Performance Measures
5 pages
CE880_Lecture6_slides
No ratings yet
CE880_Lecture6_slides
25 pages
Evaluation Measures for Machine Learning Models
No ratings yet
Evaluation Measures for Machine Learning Models
6 pages
Lecture 5
No ratings yet
Lecture 5
21 pages
Unit III Iml Final
No ratings yet
Unit III Iml Final
36 pages
Lecture 2 Classifier Performance Metrics
No ratings yet
Lecture 2 Classifier Performance Metrics
60 pages
Accuracy Precision and Recall
No ratings yet
Accuracy Precision and Recall
21 pages
Confusion Matrix
No ratings yet
Confusion Matrix
4 pages
Confusion Matrix
No ratings yet
Confusion Matrix
8 pages
Evaluation of Predictive Models Final
No ratings yet
Evaluation of Predictive Models Final
6 pages
Risk Security and Regulatory Compliance
No ratings yet
Risk Security and Regulatory Compliance
12 pages
Accuracy and error measures
No ratings yet
Accuracy and error measures
14 pages
Performance measure for a classification model.
No ratings yet
Performance measure for a classification model.
5 pages
Evaluation Measures
No ratings yet
Evaluation Measures
8 pages
Logistic Regression - Validating
No ratings yet
Logistic Regression - Validating
2 pages
Model Evaluation Metrics - A Comprehensive Guide For Beginners - by Yash - Medium
No ratings yet
Model Evaluation Metrics - A Comprehensive Guide For Beginners - by Yash - Medium
9 pages
08 Classifier Evaluation
No ratings yet
08 Classifier Evaluation
39 pages
Confusion Matrix in Machine Learning
No ratings yet
Confusion Matrix in Machine Learning
2 pages
performance evaluation
No ratings yet
performance evaluation
24 pages
IAI&ML UNIT-5
No ratings yet
IAI&ML UNIT-5
15 pages
Unit 2 Chap 4
No ratings yet
Unit 2 Chap 4
14 pages
confusion_matrix
No ratings yet
confusion_matrix
5 pages
BA
No ratings yet
BA
11 pages
Ch-EVALUATION
No ratings yet
Ch-EVALUATION
7 pages
EVALUATION - notes
No ratings yet
EVALUATION - notes
15 pages
Confusion Metrics
No ratings yet
Confusion Metrics
7 pages
Confusion Matrix ROC
No ratings yet
Confusion Matrix ROC
8 pages
Confusion Matrix ROC
No ratings yet
Confusion Matrix ROC
8 pages
517-c-30072-Assignment Chapter Evaluation
No ratings yet
517-c-30072-Assignment Chapter Evaluation
10 pages
3 - Model Evaluation & Validation
No ratings yet
3 - Model Evaluation & Validation
47 pages
19-Performance Metrics
No ratings yet
19-Performance Metrics
23 pages
Evaluation in AI
No ratings yet
Evaluation in AI
20 pages
EVALUATION PPT
No ratings yet
EVALUATION PPT
25 pages
Intermediate Analytics-Regression-Week 3-1
No ratings yet
Intermediate Analytics-Regression-Week 3-1
44 pages
2.Confusion matrix and Performmance Metrics
No ratings yet
2.Confusion matrix and Performmance Metrics
15 pages
Confusion Matrix and Classification Evaluation Metrics
No ratings yet
Confusion Matrix and Classification Evaluation Metrics
16 pages
Performance Measures
No ratings yet
Performance Measures
9 pages
Unit 4 Learning
No ratings yet
Unit 4 Learning
100 pages
AD3501-DL-UNIT 4 NOTES
No ratings yet
AD3501-DL-UNIT 4 NOTES
16 pages
Performance Metrics
No ratings yet
Performance Metrics
12 pages
Unit 7 - AI (Evaluation)
No ratings yet
Unit 7 - AI (Evaluation)
28 pages
Confusion Matrix
No ratings yet
Confusion Matrix
14 pages
Chi Squared for Beginners
From Everand
Chi Squared for Beginners
Stephanie Glen
No ratings yet
Sampling in Statistics
From Everand
Sampling in Statistics
Stephanie Glen
No ratings yet
Comparison of Reclaimer Types - Rev. 0
No ratings yet
Comparison of Reclaimer Types - Rev. 0
5 pages
03 Word Tokenization 14-26
No ratings yet
03 Word Tokenization 14-26
6 pages
Product Canvas
No ratings yet
Product Canvas
4 pages
Accomplishment Report Grad
No ratings yet
Accomplishment Report Grad
11 pages
Code Choices and Contexts at Play
No ratings yet
Code Choices and Contexts at Play
2 pages
K4T - ESV - Level 4 Book 1 Doctrinal Theme 1
No ratings yet
K4T - ESV - Level 4 Book 1 Doctrinal Theme 1
35 pages
Attitudes of BSMT Students Towards Maritime Course
No ratings yet
Attitudes of BSMT Students Towards Maritime Course
3 pages
Job Description Form
No ratings yet
Job Description Form
3 pages
Brochure - 26th World Congress On Pediatrics, Neonatology & Primary Care
No ratings yet
Brochure - 26th World Congress On Pediatrics, Neonatology & Primary Care
6 pages
List of Hindi Films of 1939 - Wikipedia
No ratings yet
List of Hindi Films of 1939 - Wikipedia
1 page
Introduction To Programming Sample Question Paper
No ratings yet
Introduction To Programming Sample Question Paper
35 pages
Matric Bolt Strength
No ratings yet
Matric Bolt Strength
7 pages
Administrative Law Notes
No ratings yet
Administrative Law Notes
16 pages
Anesthesia Case
No ratings yet
Anesthesia Case
5 pages
OM Dated 19th June 2023 Regarding Revis - 240510 - 114659
No ratings yet
OM Dated 19th June 2023 Regarding Revis - 240510 - 114659
3 pages
12 Waves and Sounds Revision Notes Getmarks App
No ratings yet
12 Waves and Sounds Revision Notes Getmarks App
67 pages
Asqarov Diplom Ishi
No ratings yet
Asqarov Diplom Ishi
71 pages
Project On Carriage of Goods by Sea
No ratings yet
Project On Carriage of Goods by Sea
11 pages
Muet Cefr 2021 Drilling Model Worksheets 4 Sets Part1 Reading Paper
No ratings yet
Muet Cefr 2021 Drilling Model Worksheets 4 Sets Part1 Reading Paper
6 pages
On&Off-Grid Hybrid Microgrid Design and Dynamic Analysis: Umit Cetinkaya Ramazan Bayindir
No ratings yet
On&Off-Grid Hybrid Microgrid Design and Dynamic Analysis: Umit Cetinkaya Ramazan Bayindir
5 pages
The Application of Nuclear Physics in Biology and Medicine 1
No ratings yet
The Application of Nuclear Physics in Biology and Medicine 1
7 pages
Chapter 16 Personal Selling
No ratings yet
Chapter 16 Personal Selling
26 pages
Interest Rate Risk Notes
No ratings yet
Interest Rate Risk Notes
157 pages
ENGLISH IV - STUDENT - S TEXTBOOK Ultima Version
No ratings yet
ENGLISH IV - STUDENT - S TEXTBOOK Ultima Version
84 pages
CSS Frontpage
No ratings yet
CSS Frontpage
3 pages
CCS Global Status Report 2018
No ratings yet
CCS Global Status Report 2018
84 pages
Using Snake and Ladder Game To Enhance Year 4 Pupils' Ability To Change Word Forms From Present Tense To Past Tense
No ratings yet
Using Snake and Ladder Game To Enhance Year 4 Pupils' Ability To Change Word Forms From Present Tense To Past Tense
28 pages
Annual Report 2022 23 Compressed
No ratings yet
Annual Report 2022 23 Compressed
114 pages
Dinesh Family Court Section 13 Petition
No ratings yet
Dinesh Family Court Section 13 Petition
4 pages

21-General approach to classification, classification by decision tree induction-17-02-2025

Uploaded by

21-General approach to classification, classification by decision tree induction-17-02-2025

Uploaded by

Performance metrics

In the Numerator, are our correct predictions (True

When to use Accuracy:

Accuracy is a good measure when the target variable

Ex:60% classes in our fruits images data are apple and

When NOT to use Accuracy:

Accuracy should NEVER be used as a measure when

Ex: In our cancer detection example with 100 people,

Let’s give a label of to our target variable:

1: When a person is having cancer

0: When a person is NOT having cancer.

Alright! Now that we have identified the problem, the

The Confusion matrix in itself is not a performance

Terms associated with

Ex: The case where a person is actually having

2. True Negatives (TN): True negatives are the cases

3. False Positives (FP): False positives are the cases

Ex: A person NOT having cancer and the model

4. False Negatives (FN): False negatives are the

Ex: A person having cancer and the model classifying

The ideal scenario that we all want is that the model

When to minimise what?

We know that there will be some error associated with

1. Minimising False Negatives:

Let’s say in our cancer detection problem example, out

2. Minimising False Positives:

For better understanding of False Positives, let’s use a

Let’s say that you are expecting an important email

Ex: In our cancer example with 100 people, only 5

Recall is a measure that tells us what proportion of

Ex: In our cancer example with 100 people, 5 people

So our denominator(True positives and False

When to use Precision and When to use Recall?:

It is clear that recall gives us information about a

Precision is about being precise. So even if we

So basically if we want to focus more on minimising

Specificity is a measure that tells us what proportion of

Specificity is the exact opposite of Recall.

Ex: In our cancer example with 100 people, 5 people

So our denominator(False positives and True

One way to do that is simply taking their arithmetic

Suppose we have 100 credit card transactions, of

Precision and Recall for the example is shown in the fig

Now, if we simply take arithmetic mean of both, then it

So, we need something more balanced than the

The Harmonic mean is given by the formula shown in

Harmonic mean is kind of an average when x and y are

For our previous example, F1 Score = Harmonic

F1 Score = 2 * Precision * Recall / (Precision + Recall)

So if one number is really small between precision and

Confusion Matrix using scikit-learn in Python

# confusion matrix in sklearn

# outcome values order in sklearn

# classification report for precision, recall f1-

print('Classification report : \n',matrix)

You might also like