0% found this document useful (0 votes)
27 views13 pages

Confusion Matrix in Binary Classification Problems - A Step-By-Ste

The article provides a detailed tutorial on the confusion matrix used in binary classification problems within machine learning. It explains the components of the matrix, including true positives, true negatives, false positives, and false negatives, and how they relate to the performance of classification models. The document also includes examples and illustrations to clarify the concepts and calculations involved.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views13 pages

Confusion Matrix in Binary Classification Problems - A Step-By-Ste

The article provides a detailed tutorial on the confusion matrix used in binary classification problems within machine learning. It explains the components of the matrix, including true positives, true negatives, false positives, and false negatives, and how they relate to the performance of classification models. The document also includes examples and illustrations to clarify the concepts and calculations involved.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Journal of Engineering Research

Volume 6 Issue 5 Article 1

2022

Confusion Matrix in Binary Classification Problems: A Step-by-


Step Tutorial
Mahmoud Fahmy Amin

Follow this and additional works at: https://digitalcommons.aaru.edu.jo/erjeng

Recommended Citation
Fahmy Amin, Mahmoud (2022) "Confusion Matrix in Binary Classification Problems: A Step-by-Step
Tutorial," Journal of Engineering Research: Vol. 6: Iss. 5, Article 1.
Available at: https://digitalcommons.aaru.edu.jo/erjeng/vol6/iss5/1

This Article is brought to you for free and open access by Arab Journals Platform. It has been accepted for
inclusion in Journal of Engineering Research by an authorized editor. The journal is hosted on Digital Commons, an
Elsevier platform. For more information, please contact [email protected], [email protected],
[email protected].
Fahmy Amin: Confusion Matrix in Binary Classification Problems: A Step-by-Ste
Vol. 6, No. 5 – 2022 Journal of Engineering Research (ERJ)

Confusion Matrix in Binary Classification Problems:


A Step-by-Step Tutorial
Mahmoud M. Fahmy
Professor, Computer and Control Engineering Department, Faculty of Engineering, Tanta University, Tanta, Egypt
e-mail: [email protected]

Abstract: In the field of machine learning, the confusion matrix (=5 + 90) times (regardless whether the predictions are
is a specific table adopted to describe and assess the performance correct or not).
of a classification model (e.g. an artificial neural network) for a  In actuality, 105 (= 100 +5) persons in the test set have
set of test data whose actual distinguishing features are known.
the disease and 100 (= 10 + 90) persons do not.
The learning algorithm is thus of the supervised learning
category. For an n-class classification problem, the confusion More conclusive information can be drawn from the
matrix is square with n rows and n columns. The rows represent confusion matrix, as elucidated below.
the class actual samples (instances) which are the inputs to the
classifier, while the columns represent the class predicted
A. Building blocks: TP, TN, FP, and FN
samples, the classifier outputs. (The converse is also valid, i.e. the Formally, a comparison of the actual classifications with
two dimensions 'actual' and 'predicted' can be assigned to
the predicted classifications reveals that four well-defined
columns and rows, respectively). Binary as well as multiple-class
classifiers can be dealt with. It is worth noting that the term outcomes emerge:
'matrix' here has nothing to do with the theorems of matrix  The actual classification is positive and the predicted
algebra; it is regarded just as an information-conveying table. classification is positive. This outcome is referred to as
The descriptive word ‘confusion’ stems from the fact that the ‘true positive’, abbreviated TP, because the positive
matrix clarifies to what extent the model confuses the classes — sample is correctly identified by the classifier.
mislabels one as another. The essential concept was introduced  The actual classification is negative and the predicted
in 1904 by the British statistician Karl Pearson (1857 — 1936).
classification is negative. This is a “true negative” (TN)
Keywords— Machine Learning; Confusion matrix; outcome because the negative sample is correctly
Accuracy; Recall; Specificity; Precision; True Negative; False identified by the classifier.
Positive, Balanced Accuracy.  The actual classification is negative and the predicted
1. BINARY CLASSIFICATION classification is positive. This is a 'false positive' (FP)
outcome because the negative sample is incorrectly
We begin with the basic and relatively simple situation of identified by the classifier as positive.
a binary classifier, where we have two classes (n= 2) and a  The actual classification is positive and the predicted
2x2 confusion matrix. See Fig. 1. Let the matrix in this figure,
classification is negative. This is a ‘false negative' (FN)
as an illustrative example, belong to a medical test conducted
outcome because the positive sample is incorrectly
on a number of persons (patients) for the presence or absence
identified by the classifier as negative.
of a certain disease. The labels 'positive' (+ve) and ‘negative’
These four outcomes, with the above interpretation,
(-ve) are used to identify these two distinct cases, respectively,
pertain in fact to the positive class, provided this class is
which are treated as two classes in a classification problem.
particularly important and deserves emphasis; it
(Other labels such as ‘1’ and ‘0’, 'yes' and ‘no’, or ‘event’ and
accommodates what can be called ‘relevant’ samples, while
‘not event’ can likewise be used). With such labeling,
the negative class is regarded as ‘irrelevant’.
attention is sometimes focused on the positive class, and its
The outcomes TP, TN, FP, and FN are of prime
classification outcomes are considered the decisive
significance and are termed the ‘building blocks’, since they
characteristics of the classifier.
are employed to formulate all performance measures as will
Predicted be evident in Section 3.
The building blocks appear naturally as the elements of the
+ve -ve
confusion matrix, as shown in Figs. 2 and 3. Note that the true
+ve 100 5 outcomes TP and TN occupy the two diagonal cells of the
Actual -ve 10 90 matrix. The false outcomes FP and FN occupying the two off-
diagonal cells imply errors; FP is a type I error and FN is a
type II error. In our example of ill and healthy persons, FP
Fig. 1. Confusion matrix for binary classifier
represents persons who are healthy and classified as ill while,
The confusion matrix of Fig. 1 tells us that: on the contrary, FN represents persons who are ill and
 The label 'positive' means the person has the disease, classified as healthy. The latter case (type II error) is normally
and the label 'negative' means the person does not. more dangerous than the former (type I error).
 A total of 205 (= 100 +5+ 10+ 90) persons were tested. Returning to Fig. 1, the building blocks are seen to be TP
 Out of the 205 persons, the classifier predicted as = 100, TN= 90, FP =10, FN =5. Ideally, FP and FN would
‘positive’ 110 (= 100+ 10) times and as ‘negative’ 95 both be of zero values, representing a perfect classifier.

T1

Published by Arab Journals Platform, 2022 1


Journal of Engineering Research, Vol. 6 [2022], Iss. 5, Art. 1
Vol. 6, No. 5 – 2022 Journal of Engineering Research (ERJ)

Predicted Solution
+ve -ve
The classification situation is illustrated in Fig. 5. Labels
+ve TP FN ‘1' and ‘0' for the two classes corresponding to 'positive' and
Actual -ve FP TN 'negative', respectively. From Fig. 5, the building blocks for
class 1 are
Fig. 2. Building blocks of positive class as elements of 2X2 confusion
matrix TP = 6, TN = 3, FP = 1, FN = 2
The confusion matrix of the classifier is shown in Fig. 6.
Predicted Type I error
+ve -ve Predicted
+ve TP FN 1 0
Actual -ve FP TN 1 6 2
Actual
0 1 3
Type II error
Fig. 3. True outcomes in diagonal cells and false outco mes in off-
diagonal cells Fig. 6. Confusion matrix for Example 1

But, in practice, there is a challenge of how to minimize Example 2


FP and FN (i.e. maximize TP and TN). Bear in mind that the A set of 1000 pens contains 650 pens of the Parker brand
building blocks are all whole positive numbers (counts); they and the remaining pens are of other brands. A binary classifier
cannot be fractions or percentages. correctly identified the 650 Parker pens anal incorrectly
It is to be noted that the positive and negative classes can identified 57 non-Parker pens as Parker.
be interchanged, so that the confusion matrix appears as in Fig. (a) How many non-Parker pens were correctly identified?
4. In comparison with Fig. 2, we find that TP and TN are
(b) Construct the confusion matrix of the classifier.
merely interchanged and so are FP and FN.
Solution
Predicted There are two classes: Parker class (positive) and non-
-ve +ve
-ve TN FP Parker class (negative). We also have
Actual N = 1000 , N+ = 650
+ve FN TP
TP = 650 , FP = 57
Fig. 4. Interchanging positive and negative classes From Eq. (3),
Furthermore, we can write: N_ = N – N+ = 1000 - 650 = 350
Number of positive samples in the test set, From Eq. (1),
N+ = TP + FN (1) FN = N+ —TP = 650 - 650 = 0
Number of negative samples in the test set, From Eq. (2),
N_ = FP + TN (2) TN = N_ - FP = 350 - 57 = 293
Total number of tested samples, That is, the number of non-Parker pens correctly identified
N = TP + FN + FP + TN = N+ + N_ (3) is 293. The confusion matrix, based on the building blocks for
Number of samples predicted as positive, the Parker class, is shown in Fig. 7.
P+ = TP + FP (4)
Number of samples predicted as negative, Predicted
Parker Non-Parker
P = FN + TN = N - P+ (5)
Parker 650 0
Example 1 Actual
Non-Parker 57 293
Consider a set of 12 persons, numbered as 1 through 12.
Fig. 7. Confusion matrix for Example 2
Persons 1 through 8 suffer from the covid disease and belong
to class 1, while persons 9 through 12 are covid-free and B. Building blocks for individual classes
belong to class 0. A binary classifier for this set made 9 correct
predictions and 3 incorrect ones. Persons 1 and 2 were When the two classes handled by a binary classifier are
predicted as covid-free and person 9 was predicted as having nearly of the same importance, the two sets of their building
covid. blocks, with foreseen interrelations, are to be equally
(a) Determine the building blocks for class 1. studied. We identify the individual classes with arbitrary
labels, and no preference is given to one class over the other.
(b) Construct the confusion matrix of the classifier.
Figure 8 shows a confusion matrix for two classes labeled A
and B, where it is seen that:

Person's number 1 2 3 4 5 6 7 8 9 10 11 12
Actual 1 1 1 1 1 1 1 1 0 0 0 0
classification
Predicted 0 0 1 1 1 1 1 1 1 0 0 0
classification
Outcome FN FN TP TP TP TP TP TP FP TN TN TN

Fig. 5. Outcomes for Example 1

T2

https://digitalcommons.aaru.edu.jo/erjeng/vol6/iss5/1 2
Fahmy Amin: Confusion Matrix in Binary Classification Problems: A Step-by-Ste
Vol. 6, No. 5 – 2022 Journal of Engineering Research (ERJ)

Predicted which are intrinsic relationships between the building blocks


A B of class A and those of class B. Note that the symbols P and
N are just interchanged when transferring from class A to
A 215 25 NA=240
Actual
class B and vice versa. This implies an interesting result that
B 40 190 NB=230 once the building blocks of one class are determined, the
building blocks of the other class are readily known with no
Fig. 8. A confusion matrix of binary classifier with classes A and B additional calculations. In Fig. 8, we already have
Number of tested class-A samples, TPA = TNB = 215 , TNA = TPB = 190
NA = 215 + 25 = 240 FPA = FNB = 40 , FNA = FPB = 25
Number of tested class-B samples, It is also obvious from relationships (6) that, for classes A
NB = 40 + 190 = 230 and B, the sum of true positives is equal to the sum of true
Total number of tested samples, negatives,
N = NA + NB = 240 +230 = 470 TPA + TPB = TNA + TNB (7)
Here, the descriptors 'positive' and 'negative' do not appear, and the sum of false positives is equal to the sum of false
but their intended meanings are implicit. If we consider class negatives ,
A, we understand that: FPA + FPB = FNA + FNB (8)
From another perspective, under conditions (6), the confusion
 Class-A samples correctly classified are TPA, true
matrix of a binary classifier with classes A and B can take
positives for class A; TPA = 215. either of the two forms shown in Fig. 10. In Fig.10a, the first
 Class-B samples correctly classified are TNA, true row (column) is assigned to class A and, in Fig. 10b, the first
negatives for class A; TNA = 190. row (column) is assigned to class B. The two forms are of
 Class-B samples incorrectly classified as class A course equivalent; they convey the same pieces of information.
are FPA, false positives for class A; FP A = 40. The confusion matrix in Fig. 8 can thus take an alternative
 Class-A samples incorrectly classified as class B (equivalent) form of Fig. 11, by interchanging classes A and
are FNA, false negatives for class A; FNA = 25. B. From either form, we immediately realize that:
 215 class_A samples are correctly classified.
Considering class B, on the other hand, we understand  190 class_B samples are correctly classified.
that:
 Class-B samples correctly classified are TPB, true  40 class-B samples are incorrectly classified as class
positives for class B; TPB = 190. A.
 Class-A samples correctly classified are TNB, true  25 class_A samples are incorrectly classified as class
negatives for class B; TNB = 215. B.
 Class-A samples incorrectly classified as class B are TPA : A  A TPB : B  B
FPB, false positives for class B; FPB = 25. TNA : B  B TNB : A  A
FPA : B  A FPB : A  B
 Class-B samples incorrectly classified as class A are
FNA : A  B FNB : B  A
FNB, false negatives for class B; FNB = 40.
For easy reference and remembrance, the building blocks Class A Class B
for classes A and B are represented symbolically in Fig. 9. The
Fig. 9. Symbolic representation of building blocks for two classes
directed symbol A  A, for example, means when the input
to the classifier is A, the classifier output is A.
Predicted
A little thought ensures that: B A
TPA = TNB B 190 40 NB=230
TNA = TPB Actual A 25 215 NA=240
FPA = FNB }
} (6)
FNA = FPB Fig. 11. Another form for confusion matrix of Fig. 8

Predicted Predicted
A B B A
A TPA = TNB FNA = FPB B TPB = TNA FNB = FPA
Actual Actual
B FPA = FNB TNA = TPB A FPB = FNA TNB = TPA
(a) (b)
Fig. 10. Two forms for confusion matrix of binary classifier through interchange of classes

T3

Published by Arab Journals Platform, 2022 3


Journal of Engineering Research, Vol. 6 [2022], Iss. 5, Art. 1
Vol. 6, No. 5 – 2022 Journal of Engineering Research (ERJ)

Example 3 2. PERFORMANCE MEASURES FOR BINARY


A binary classifier has the confusion matrix of Fig. 12 for
CLASSIFICATION
classes K and L.
(a) For class K, how many samples are correctly Based on the confusion matrix, we define a group of
classified and how many are incorrectly classified? different performance measures (metrics) for the evaluation
(b) Repeat part (a) for class L. of binary classification models. The most-widely used
(c) measures are discussed in Subsections 3.1 through 3.6.
Generally, as the value of the measure gets larger, the
(d) How many class-K samples are tested?
classifier becomes better.
(e) How many class-L samples are tested?
(f) Determine the building blocks for classes K and L. A. Accuracy
(g) Construct another equivalent form for the classifier The accuracy of a binary classification model is the ratio
confusion matrix. of the number of correctly classified samples (true outcomes)
Predicted to the total number of tested samples. Referring to Fig. 2, the
K L model accuracy is
TP+TN TP+TN
K 510 70 Accuracy = = (9)
Actual N TP+TN+FP+FN
L 100 660
In Fig.1, for example, since TP=100, TN=30, and N = 20S,
Fig. 12. Confusion matrix for Example 3 then
100+90
Accuracy = = 0.927 (92.7%)
Solution 205
Number of class-K samples correctly classified, This indicates that 92.7% of the tested samples are correctly
TPK = 510 (= TN L) classified or, equivalently, the classification error is 7.3%.
Number of class-K samples incorrectly classified, In terms of two classes A and B, the model accuracy takes the
FNK = 70 (= FPL) forms
Number of class-L samples correctly classified, TP +TN𝐴 TP +TN𝐵
Accuracy = 𝐴 = 𝐵 (10a)
TPL = 660 (= TNK) N N
Number of class-L samples incorrectly classified, which can also be written as
FNL = 100 (= FPK)
TP𝐴 +T𝑃𝐵 TN𝐴 +TN𝐵
Number of tested class-K samples, Accuracy = = (10b)
N N
NK = 510 + 70 = 580
Number of tested class-L samples, In view of relationships (6) and Fig. (10), the four
NL = 100 + 660 = 760 (apparently different) forms of Eqs.(10) are the same in value.
It is interesting to think in a like manner of the accuracy of the
The building blocks for classes K and L are given in Fig. TP +TN𝐴
individual classes. For class A, AccuracyA = 𝐴 and for
13. Another equivalent form for the confusion matrix is shown N
TP +TN
in Fig. 14, obtained. From Fig. 12 by interchanging classes K class B, AccuracyB = 𝐵 𝐵
. This implies that the model
N
and L. accuracy is the same as the accuracy of either of the two
classes.
TP TN FP FN In Fig. 8, TPA = TNB = 215, TNA = TPB = 190, and N =
Class K 510 660 100 70 470. Therefore,
Accuracy = AccuracyA = AccuracyB
Class L 660 510 70 100
= (215 +190)/470 = 0.862
Fig. 13. Building blocks for classes K and L in Example 3 In spite of the formality of the accuracy measure, it is
unfortunately reliable only if the two classes have balanced
Predicted datasets. Two datasets are said to be balanced when they have
nearly the same number of samples. Otherwise, the datasets
L K
are imbalanced and the accuracy measure can be misleading.
L 660 100 To demonstrate, suppose class A has N A = 1000 samples and
Actual K 70 510 class B has NB = 50 samples (only 5% of class A). Here the
classification model will be 'biased' to class A which has the
Fig. 14. Another form for confusion matrix in Example 3 majority of samples. The confusion matrix in this case may
have the form of Fig. 15.
Predicted The model accuracy, by Eq. (10), is calculated as
A B
Accuracy = (990 + 2)/1050 = 0.945 (94.5 %)
A 990 10 NA=1000
Actual B 48 2 NB=50 which can be judged as an acceptably high level of accuracy;
992 samples are correctly classified out of 1050 samples.
Fig. 15. Confusion matrix for two imbalanced datasets However, when we examine the outcomes of the individual
classes, we find out that while 990 class-A samples are

T4

https://digitalcommons.aaru.edu.jo/erjeng/vol6/iss5/1 4
Fahmy Amin: Confusion Matrix in Binary Classification Problems: A Step-by-Ste
Vol. 6, No. 5 – 2022 Journal of Engineering Research (ERJ)

100
correctly classified out of 1000 samples with a percentage as Precision = = 0.909 (90.9 %)
100+10
high as 99% (taken as TPA/NA), only two class-B samples are
correctly classified out of 50 samples with a very low In an ideal case when FP = 0, the precision reaches its
percentage of 4% (TPB/NB). These results warn us that the maximum value of 1.0. This means that all samples predicted
94.5 % accuracy cannot be relied upon; it deceptively as positive actually belong to the positive class (TP = P +); the
describes the classification reliability of individual classes type I error is of zero value. See Fig. 17. Strictly speaking,
with dataset imbalance. One other measure, called balanced expression (14) is the precision of the positive class.
accuracy, will be specified in Subsection 3.5 for imbalanced Predicted
datasets. +ve -ve
+ve TP = P+ FN
Example 4 Actual
-ve FP = 0 TN
A binary classification model is used for two classes A and P+ P_
B. The number of samples classified as class A is 169 and the
Fig. 17. Maximum precision
number of samples classified as class B is 157. The type I and
type II errors are recorded to be 39 samples and 46 samples, For two classes A and B, we write
respectively. TP𝐴 TP𝐴
(a) What is the percentage of correctly classified Precision𝐴 = = (15)
𝑃𝐴 TP𝐴 + FP𝐴
samples in class A? class B? TP𝐵 TP𝐵
Precision𝐵 = = (16a)
(b) Determine the accuracy of the model. 𝑃𝐵 TP𝐵 + FP𝐵

Solution See the two forms of confusion matrix in Fig. 18. In


The data given is represented in the confusion matrix of expression (15), two class-A building blocks TPA and FPA
Fig. 16, and we have (first column in Fig. 18a) are used and, similarly, two class-B
building blocks TPB and FPB (first column in Fig. 18b) are
Predicted used in expression (16a). In words, the precision of a certain
A B class is the ratio of the number of samples of the class
A TPA FNA = 46 NA correctly classified as belonging to this class (true positives)
Actual
B FPA = 39 TNA NB to the number of all samples classified, correctly or incorrectly,
PA = 169 PB = 157 as belonging to the same class (true positives plus false
positives).
Fig. 16. Confusion matrix for Example 4

N = PA + PB = 169 + 157 = 326 TP𝐴 TP𝐴 Predicted


Precision𝐴 = =
𝑃𝐴 TP𝐴 + FP𝐴 A B
TPA = PA - FPA = 169 - 39 = 130
A TPA FNA
TNA = PA - FNA = 157 - 46 = 111 (= TPB) Actual
B FPA TNA
NA = TPA + FNA = 130 + 46 = 176 PA PB
NB = FPA + TNA = 39 + 111 = 150 (a) Class A
Percentage of correctly classified samples in class A,
TP𝐵 TP𝐵 Predicted
TP𝐴 130∗100 Precision𝐵 = =
∗ 100 = = 73.9% (11) 𝑃𝐵 TP𝐵 + FP𝐵 B A
𝑁𝐴 176
B TPB FNB
Percentage of correctly classified samples in class B, Actual
TP𝐵 111∗100
A FPB TNB
∗ 100 = = 74% (12)
𝑁𝐵 150 PB PA

Model accuracy, by Eq. (10), (b) Class B


TP𝐴 +TN𝐴 130+111
= = 0.739% (73.9%) (13) Fig. 18. Precisions of classes A and B as obtained from two forms of
𝑁 326 confusion matrix
The difference in the values of (11), (12), and (13) is really By virtue of relationships (6), PrecisionB in (16a) can also
slight. The reason is that the datasets of the two classes are be expressed in terms of two class-A building blocks TNA and
balanced. FNA (second column in Fig. 18a) as
B. Precision TN𝐴
Precision𝐵 = (16b)
TN𝐴 + FN𝐴
The precision is the ratio of the number of samples
correctly classified as positive to the number of all samples That is, one form of confusion matrix, as that in Fig. 18a, can
classified as positive. Considering the first column of Fig. 2, give us both PrecisionA and PrecisionB by considering the two
we have columns of the matrix, respectively, as illustrated in Fig. 19.
TP TP
Similar arguments apply to the other form in Fig. 18b.
Precision = = (14)
𝑃+ TP + FP In Fig. 8, TPA = 215, FPA = 40, FNA = 25, and TNA = 190.
For example, in Fig. 1, where TP=100 and FP = 10,

T5

Published by Arab Journals Platform, 2022 5


Journal of Engineering Research, Vol. 6 [2022], Iss. 5, Art. 1
Vol. 6, No. 5 – 2022 Journal of Engineering Research (ERJ)

TP𝐴 Predicted TN𝐴


Precision𝐴 =
TP𝐴 + FP𝐴
Precision𝐵 =
TN𝐴 + FN𝐴
240(0.843) + 230(0.884)
A B Precisionweighted = = 0.863
470
A TPA FNA
Actual Example 5
B FPA TNA
From the confusion matrix of Fig. 15, determine the
Fig. 19. Precisions of classes A and B as obtained from one form of macro-, micro-, and weighted-average precisions of the
confusion matrix classification model.
Therefore, Solution
PrecisionA = 215/(215 + 40) = 0.843
The classes A and B in Fig. 15 have imbalanced datasets.
PrecisionB = 190/(190 + 25) = 0.884 We have
PrecisionA = 990/(990 + 48) = 0.954
A crucial question is: What is the precision of the binary
classification model as a whole? This is determined through PrecisionB = 2/(2 + 10) = 0.167
some sort of averaging of the precisions of the two individual
Using Eqs. (17), (18b), and (19), we obtain
classes. There are three methods to define an average
Precisionmacro = (0.954 + 0.167)/2 = 0.561
precision; namely,
Precisionmicro = (990 + 2)/1050 = 0.945
 Macro-average
Precisionwieghted = [1000(0.954)+ 50(0.167)]/1050 = 0.917
 Micro-average
 Weighted-average C. Recall (Sensitivity)
The values calculated from these methods generally differ
The recall (also termed sensitivity) is the ratio of the
from each other, especially for imbalanced datasets,
number of samples correctly classified as positive to the
depending on the individual class precisions.
number of all actual positive samples. From the first row of
The macro-average precision of a model with classes A
Fig. 2, we have
and B is
Precision𝐴 +Precision𝐵 TP TP
Precision𝑚𝑎𝑐𝑟𝑜 = (17) Recall = = (20)
2 N+ TP + FN

i.e. the arithmetic average (mean) of the two precisions, with In Fig. 1, where TP =100 and FN =5,
equal weights of unity.
The micro-average precision is Recall = 100/(100+5) = 0.952 (95.2 %)
TP𝐴 +TP𝐵 In an ideal case when FN=0, the recall attains its maximum
Precision𝑚𝑖𝑐𝑟𝑜 = (18a)
TP𝐴 +TP𝐵 +FP𝐴 +FP𝐵 value of 1.0, meaning that all actual samples of the positive
class are correctly classified (TP=N+), with zero type II error.
where the true positives and false positives for class A are
See Fig. 20. Specifically, expression (20) is the recall of the
amalgamated with their counterparts for class B. Since the
positive class.
four-term sum in the denominator of expression (18a) is equal Predicted
to N, we can also write
+ve -ve
TP𝐴 +TP𝐵
Precision𝑚𝑖𝑐𝑟𝑜 = (18b) +ve TP = N+ FN=0 N+
𝑁
Actual
It is to be noted in the meantime that expression (18b) is -ve FP TN N-
the same as the model accuracy defined in (10), and thus Fig. 20. Maximum recall
Precisionmicro = Accuracy (18c) For two classes A anal B, we write
The weighted-average precision is TP𝐴 TP𝐴
Recall𝐴 = = (21)
NA (PrecisionA )+NB (PrecisionB ) 𝑁𝐴 TP𝐴 + FN𝐴
Precisionweighted = (19a) TP𝐵 TP𝐵
N
Recall𝐵 = = (22a)
𝑁𝐵 TP𝐵 + FN𝐵
or, by Eqs. (15) and (16a),
NA N See Fig. 21. In expression (21), two class-A building
(TPA )+ B (TPB )
Precisionweighted =
PA PB
(19b) blocks TPA and FNA (first row in Fig. 21a) are used and,
N similarly, two class-B building blocks TPB and FNB (first row
where PrecisionA and PrecisionB are weighted by NA and NB, in Fig. 21b) are used in expression (22a). The recall of a
respectively. certain class is thus the ratio of the number of samples of the
In Fig. 8, classes A and B have balanced datasets. Since class correctly classified as belonging to this class (true
PrecisionA = 0.843 and PrecisionB = 0.884, positives) to the number of all actual samples of the same class
0.843 +0.884
(true positives plus false negatives ).
Precision𝑚𝑎𝑐𝑟𝑜 = = 0.864 By relationships (6), RecallB in (22a) can also be
2
expressed in terms of two class-A building blocks TNA and
Since TPA = 215, TPB = 190, and N= 470, FPA (second row in Fig. 21a) as
215 + 190
Precision𝑚𝑖𝑐𝑟𝑜 = = 0.862 (= Accuracy) TN𝐴
470
Recall𝐵 = (22b)
TN𝐴 + FP𝐴
Since NA = 240 and NB = 230,

T6

https://digitalcommons.aaru.edu.jo/erjeng/vol6/iss5/1 6
Fahmy Amin: Confusion Matrix in Binary Classification Problems: A Step-by-Ste
Vol. 6, No. 5 – 2022 Journal of Engineering Research (ERJ)

TP𝐴 Predicted
Recall𝐴 =
TP𝐴 + FN𝐴 A B Predicted
A TPA FNA NA Removed Not removed
Actual Tumour TP FN=0
B FPA TNA NB
Actual
PA PB Healthy FP=ϵ1 TN

(a) Case 1: Low precision, high recall


(a) Class A
TP𝐵 Predicted Predicted
Recall𝐵 =
TP𝐵 + FN𝐵 B A Removed Not removed
Tumour TP FN=ϵ2
B TPB FNB Actual
Healthy FP=0 TN
Actual
A FPB TNB
)b) Case 2: High precision, low recall
PB PA
Fig. 23 Trade-off between precision and recall
(b) Class B
Here, TP is the number of tumour cells correctly removed,
Fig. 21 Recalls of classes A and B as obtained from two forms of FP is the number of healthy cells incorrectly removed, FN is
confusion matrix
the number of tumour cells incorrectly not removed, and TN
is the number of healthy cells correctly not removed. Figure
hat is, both RecallA and RecallB can be obtained from the
23a represents case 1, that of low precision anal high recall
form of confusion matrix in Fig. 21a alone, by considering the
(FP = ϵ1, FN = 0; non-zero type I error), while Fig. 23b
two rows of the matrix, respectively, as Fig. 22 illustrates.
represents case 2, that of high precision and low recall (FP =
Similar arguments apply Fig. 21b.
0 , FN = ϵ2; non-zero type II error).
In Fig. 8, TPA = 215, FNA = 25, TNA = 190, and FPA = 40
and therefore Paying attention to the Tumour class, Fig. 23a gives
TP
RecallA = 215/(215 + 25) = 0.896 Precision(case 1)tumour = (<1)
TP + ϵ1
RecallB = 190/(190 + 40) = 0.826
Recall(case 1)tumour =1 (maximum)
TP𝐴 Predicted
Recall𝐴 = and Fig. 23b gives
TP𝐴 + FN𝐴 A B
A TPA FNA
Recall𝐵 =
TN𝐴 Precision(case 2)tumour = 1 (maximum)
Actual TN𝐴 + FP𝐴
B FPA TNA TP
Recall(case 2)tumour = (<1)
TP + ϵ2
PA PB
LE is conceivable that precision is an indication of 'quality'
Fig. 22. Recalls of classes A and B as obtained from one form of
confusion matrix
and recall is an indication of 'quantity', as implied by the
definitions of precision in Eq. (14) and recall in Eq. (20). In
Often, there exists an inverse relationship between the example of brain surgery, Fig. 23a shows that the precision
precision and recall in the sense that it is possible to increase is the number of tumour cells removed out of the total number
one at the cost of decreasing the other. Brain surgery provides of cells removed. This indicates the quality of surgery success.
a comprehensible situation of the implied trade-off. Consider The recall, on the other hand, is the number of tumour cells
a surgeon removing cancer tumour from a patient’s brain. The removed out of the total number of tumour cells. This is the
surgeon is keen to remove all tumour cells because any such quantity of successful surgery results.
cells left would regenerate the tumour. At the same time, the Moving on to the recall of the classification model, we
surgeon should avoid removing any healthy cells not ta cause define the macro-, micro-, and weighted-average recalls. In
the patient to suffer from impaired brain functions. line with Eqs. (17), (18), and (19) for average precisions of
Nevertheless, in the careful endeavor to ensure that all tumour two classes A and B, we have
cells have been removed, the surgeon mistakenly may remove Recall𝐴 +Recall𝐵
Recall𝑚𝑎𝑐𝑟𝑜 = (23)
some (a small number ϵ1) of healthy cells. This is a case of 2
decreasing precision and increasing recall. TP𝐴 +TP𝐵 TP𝐴 +TP𝐵
Recall𝑚𝑖𝑐𝑟𝑜 = = (24)
On the other hand, the surgeon is keen to ensure that no TP𝐴 +TP𝐵 +FN𝐴 +FN𝐵 𝑁
healthy cells have been removed, but by mistake, some (ϵ2) NA (RecallA )+NB (RecallB )
tumour cells may not be removed. This is a case of decreasing Recallweighted = (25)
N
recall and increasing precision. That is to say, low precision It is, however, to be noted that
(high recall) guarantees the removal of all tumour cells but
gives an opportunity for some healthy cells to be removed also. Recall𝑚𝑖𝑐𝑟𝑜 = Recall𝑤𝑒𝑖𝑔ℎ𝑡𝑒𝑑 (26)
B contrast, high in precision (low recall) guarantees that
all healthy cells are not removed but some tumour cells may as is deduced by substituting for RecallA and RecallB from Eqs.
not be removed as well. See Fig. 23 for a corresponding (21) and (22a), respectively, into Eq.(25)
confusion matrix, where two classes are identified: A Tumour 𝑇𝑃𝐴 𝑇𝑃𝐵
N𝐴 ( ) + N𝐵 ( ) TP𝐴 + TP𝐵
𝑁𝐴 𝑁𝐵
(positive) class which has the tumour cells and a healthy Recall𝑤𝑒𝑖𝑔ℎ𝑡𝑒𝑑 = =
(negative) class which has the healthy cells. N 𝑁
= Recall𝑚𝑖𝑐𝑟𝑜

T7

Published by Arab Journals Platform, 2022 7


Journal of Engineering Research, Vol. 6 [2022], Iss. 5, Art. 1
Vol. 6, No. 5 – 2022 Journal of Engineering Research (ERJ)

It is also evident from Eqs. (24) and (18b) that Using definitions (30) and (31) and results of Example 6,
we obtain for class A,
Recall𝑚𝑖𝑐𝑟𝑜 = Precision 𝑚𝑖𝑐𝑟𝑜 (27)
TPRA = RecallA = 0.99
and moreover by Eq. (8c),
FNRA = 1 - TPRA = 1 - 0.99 = 0.01
Recall𝑚𝑖𝑐𝑟𝑜 = Accuracy (28)
and for class B,
Combining Eqs. (26), (27), and (28), we can write
TPRB = RecallB = 0.04
Accuracy = Precision 𝑚𝑖𝑐𝑟𝑜 = Recall𝑚𝑖𝑐𝑟𝑜 = FNRB = 1- TPRB = 1 - 0.04 = 0.96
Recall𝑤𝑒𝑖𝑔ℎ𝑡𝑒𝑑 (29) For the classification model,
In Fig. 8, since RecallA = 0.896 and RecallB = 0.826, TPRmacro = Recallmacro = 0.515
0.896 + 0.826
Recall𝑚𝑎𝑐𝑟𝑜 = = 0.861 FNRmacro = 1 - TPRmacro = 1 - 0.515 = 0.485
2
Since TPA = 215, TPB = 190, and N = 470, D. Specificity
215 + 190 The specificity is the ratio of the number of samples
Recall𝑚𝑖𝑐𝑟𝑜 = = 0.862 correctly classified as negative to the number of all actual
470
negative samples. From the second row of Fig. 2, we have
Since NA = 240 and NB = 230,
TN TN
240(0.896) + 230(0.826) Specificity = = (32)
N− TN + FP
Recall𝑤𝑒𝑖𝑔ℎ𝑡𝑒𝑑 = = 0.862
470
In Fig.1, where TN =90 and FP = 10,
Equation (29) is already satisfied, where
Specificity = 90/(90 + 10) = 0.9
Accuracy = Precision 𝑚𝑖𝑐𝑟𝑜 = Recall𝑚𝑖𝑐𝑟𝑜 =
Recall𝑤𝑒𝑖𝑔ℎ𝑡𝑒𝑑 = 0.862 In an ideal case when FP = 0 (zero type I error), the
specificity has its maximum value of 1.0, meaning that all
Example 6
actual samples of the negative class are correctly classified
For the confusion matrix of Fig. 15, determine the macro-, (TN = N_). Remember that the same condition FP = 0, Fig.
micro, and weighted-average recalls of the classification 17, makes the precision also at its maximum value of 1.0. See
model. Fig. 24. Expression (32) is in fact the specificity of the positive
Solution class.
For two classes A and B,
From Eqs. (21) and (22a),
TN𝐴 TN𝐴
RecallA = 990/(990 + 10) = 0.99 Specificity𝐴 = = (33)
N𝐵 TN𝐴 + FP𝐴

RecallB = 2/(2 + 48) = 0.04 TN𝐵 TN𝐵


Specificity𝐵 = = (34a)
N𝐴 TN𝐵 + FP𝐵
From Eqs. (23), (24), and (27),
Predicted
Recallmacro = (0.99 + 0.04)/2 = 0.515
+ve -ve
Recallmicro = (990 + 2)/1050 = 0.945 +ve TP FN N+
Actual
Recallweighted = Recallmicro = 0.945 -ve FP=0 TN= N- N-

Fig. 24. Maximum specificity (maximum precision)


We remark that two other expressions, related to recall, are
used as performance measures. These are TPR (true positive See Fig.25. In expression (33), two class-A building
rate) and FNR (false negative rate). The TPR is the same thing blocks TNA and FPA (second row in Fig. 25a) are used, and
as recall, Eq. (20), two class-B building blocks TNB and FPB (second row in Fig.
TP TP
25b) are used in expression (34a). The specificity of one class
TPR = = = Recall (30) is thus the ratio of the number of samples of the other class
N+ TP + FN
correctly classified as belonging to the other class (true
and the FNR is negatives) to the number of all actual samples of the other
FN FN
FNR = 1 − TPR = = (31) class (true negatives plus false positives).
N+ TP + FN
By relationships (6), SpecificityB in (34a) can also be
i.e. the ratio of the number of samples incorrectly classified as expressed in terms of two class-A building blocks TPA and
negative to the number of all actual positive samples. FNA (first row in Fig. 25a) as
TP TP𝐴
Example 7 Specificity𝐵 = 𝐴 = (34b)
N𝐴 TP𝐴 + FN𝐴
In Example 6, determine
(a) TPR and FNR of each of the two classes A and B. Therefore, both SpecificityA and SpecificityB can be
(b) TPRmacro and FNRmacro of the classification model. obtained from one form of confusion matrix, that of Fig. 25a,
Solution by considering the two rows of the matrix, respectively. See
Fig. 26. Similar arguments apply to Fig. 25b.

T8

https://digitalcommons.aaru.edu.jo/erjeng/vol6/iss5/1 8
Fahmy Amin: Confusion Matrix in Binary Classification Problems: A Step-by-Ste
Vol. 6, No. 5 – 2022 Journal of Engineering Research (ERJ)

Predicted TN𝐴 or, by Eqs. (33) and (34a),


Specificity𝐴 =
TN𝐴 + FP𝐴
NA N
A B (TNA )+ B (TNB )
NB NA
A TPA FNA NA Specificity𝑤𝑒𝑖𝑔ℎ𝑡𝑒𝑑 = (42b)
N
Actual B FPA TNA NB
Example 8

(a) Class A
For the confusion matrix of Fig. 15, determine the macro-,
micro-, and weighted-average specificities of the
Predicted TN𝐵 classification model.
Specificity𝐵 =
B A TN𝐵 + FP𝐵
Solution
B TPB FNB NB
Using results of Example 6, we obtain
Actual
A FPB TNB NA
SpecificityA = RecallB = 0.04
(b) Class B SpecificityB = RecallA = 0.99
Fig. 25. Specificities of classes A and B as obtained from two forms of Specificitymacro = Recallmacro = 0.515
confusion matrix
Specificitymicro = Recallmicro = 0.945
TP𝐴 Predicted
Specificity𝐵 = From Eq. (42),
TP𝐴 + FN𝐴 A B
A TPA FNA TN𝐴 Specificityweighted = [1000(0.04) + 50(0.99)]/1050 = 0.085
Specificity𝐴 =
Actual TN𝐴 + FP𝐴 In addition to TPR and FNR expressed along with recall
B FPA TNA
at the end of Subsection 3.3, we here define TNR (true
Fig. 26. Specificities of classes A and B as obtained from one form of negative rate) and FPR (false positive rate). The TNR is the
confusion matrix same thing as specificity, Eq. (32),
By comparison, it is clear that Eqs. (33) and (22b) are TN T𝑁
TNR = = = Specificity (43)
identical and so are Eqs. (34b) and (21), providing the results N− TN + FP

SpecificityA = RecallB (35) and the FPR is


F𝑃 FP
SpecificityB = RecallA (36) FPR = 1 − TNR = = (44)
N− TN + FP
i.e. the specificity of one class is nothing but the recall of the i.e. the ratio of the number of samples incorrectly classified as
other class. positive to the number of all actual negative samples.
In Fig. 8, RecallA = 0.896 and RecallB = 0.826 and Example 9
therefore
SpecificityA = 0. 826 , SpecificityB = 0.896 In Example 6, determine
(a) TNR and FPR of each of the two classes A and B.
The macro-, micro-, and weighted-average specificities (b) TNRmicro and FPRmicro of the classification model.
are defined in analogy to both average precisions and average Solution
recalls. The first two average specificities take several forms
based on previously derived relationships. We have Using definitions (43) and (44) and results of Example 8,
Specificity𝐴 +Specificity𝐵 Recall𝐴 +Recall𝐵
we obtain for class A,
Specificity𝑚𝑎𝑐𝑟𝑜 = =
2 2 TNRA = SpecificityA = 0.04
(37)
implying that FPRA = 1 - TNRA = 1 - 0.04 = 0.96
Specificity𝑚𝑎𝑐𝑟𝑜 = Recall𝑚𝑎𝑐𝑟𝑜 (38) and for class B,
TN𝐴 +TN𝐵 TN𝐴 +TN𝐵 TNRB = SpecificityB = 0.99
Specificity𝑚𝑖𝑐𝑟𝑜 = = =
TN𝐴 +TN𝐵 +FP𝐴 +FP𝐵 𝑁
TP𝐴 +TN𝐴 FPRB = 1 – TNRB = 1 - 0.99 = 0.01
(39)
𝑁
implying that For the classification model,
Specificity𝑚𝑖𝑐𝑟𝑜 = Recall𝑚𝑖𝑐𝑟𝑜 = Precision𝑚𝑖𝑐𝑟𝑜 (40) TNRmicro = Specificitymicro = 0.945
It turns out that Specificitymicro is to be incorporated in Eq. FPRmicro = 1 – TNRmicro = 1 - 0.945 = 0.055
(29), so that we can write E. Balanced accuracy
Accuracy = Precision𝑚𝑖𝑐𝑟𝑜 = Recall𝑚𝑖𝑐𝑟𝑜
In Subsection (A), we emphasized the fact that the
= Recall𝑤𝑒𝑖𝑔ℎ𝑡𝑒𝑑 = Specificity𝑚𝑖𝑐𝑟𝑜 (41) accuracy measure of a binary classifier can be misleading
when the datasets of the two classes are imbalanced. A
The weighted-average specificity is performance measure, known as balanced accuracy, is thus
NA (SpecificityA )+N𝐀 (SpecificityB ) introduced. It combines recall and specificity in the form
Specificityweighted = (42a)
N

T9

Published by Arab Journals Platform, 2022 9


Journal of Engineering Research, Vol. 6 [2022], Iss. 5, Art. 1
Vol. 6, No. 5 – 2022 Journal of Engineering Research (ERJ)

Recall +Specificity Balanced accuracy = (0.945 + 0.938)/2 = 0.944


Balanced accuracy = (45)
2
The values of accuracy and balanced accuracy are seen to
i.e. the arithmetic average of recall and specificity. Remember
be approximately the same. The reason is that the datasets of
that recall, TP/N+, deals with only the positive class while
the two classes are balanced.
specificity, TN/N-, deals with only the negative class. A
However, for a binary classifier with the confusion matrix
combination of these two measures proves advantageous,
of Fig. 28, where the datasets of the two classes are
especially for imbalanced datasets.
imbalanced (NA = 10, NB = 190), we have
In Fig. 1, where Recall = 0.952 and Specificity = 0.9,
Accuracy = (0 + 190)/(10 +190) = 0.95 (95%)
Balanced accuracy = (0.952 + 0.9)/2 = 0.926
For two classes A and B, the balanced accuracy of the Predicted
model is the arithmetic average of recall and specificity of A B
either class A or class B; A 0 10 NA = 10
𝑅𝑒𝑐𝑎𝑙𝑙 𝐴 +Specificity𝐴 Actual
Balanced accuracy = = B 0 190 NB = 190
2
𝑅𝑒𝑐𝑎𝑙𝑙 𝐵 +Specificity𝐵
(46) Fig. 28. Confusion matrix with imbalanced datasets
2
Make sure that the two expressions in Eq. (46), by Eqs.
(35) and (36), are identical. The accuracy is calculated to be of a high value (95%),
In Fig. 8, where RecallA = SpecificityB = 0.896 and RecallB giving an impression that the classifier performs quite
= SpecificityA = 0.826, properly. But this is far from reality. Although the classifier
correctly predicts all samples of class B (FNB = 0), it does not
Balanced accuracy = (0.896 + 0.826)/2 = 0.861 correctly predict any sample of class A (TPA = 0). The
Equation (46) can alternatively be written as classifier has a deficiency in performance, not detected by
accuracy. In other words, the 95% accuracy is misleading and
𝑅𝑒𝑐𝑎𝑙𝑙 𝐴 +𝑅𝑒𝑐𝑎𝑙𝑙 𝐵
Balanced accuracy = = cannot be relied upon. Balanced accuracy can be taken into
2
Specificity𝐴 +Specificity𝐵 account instead. Since
(47)
2
This provides a noticeable result that the balanced RecallA = 0/10 = 0
accuracy is the same as the macro-average recall of classes A RecallB = 190/190 = 1
and B or the macro-average specificity of the two classes, then
Balanced accuracy = Recallmacro = Specificitymacro (48) Balanced accuracy = (0 + 1)/2 = 0.5 (50%)

Example 10 which is considerably less than the value of accuracy and may
thus be reliable. The difference in the values of accuracy and
For the confusion matrix of Fig. 15, determine the balanced accuracy is due to the imbalance of datasets.
balanced accuracy of the classification model.
Solution F. Fβ measure and F1 score
Using the value of the macro-average recall, or the macro- The precision and recall are commonly combined to
average specificity, in the solution of Example 8, Eq. (48) provide a performance measure called Fβ measure, defined as
yields 1 Recall×Precision
Balanced accuracy = 0.515 Fβ = β 1−β = (49)
+ β(Recall)+(1−β)Precision
Precision Recall
A comparison between balanced accuracy and accuracy is This means that Fβ is the weighted harmonic average of
in order. Consider a binary classifier with the confusion precision and recall. Here, B is a positive fractional factor, 0
matrix of Fig. 27, where the datasets of classes A and B are < β <1, which reflects the importance of precision and recall
balanced (NA = 195, NB = 192). The accuracy, by Eq. (10a), with respect to each other. The greater β is, the greater
is importance is given to precision and, conversely, the smaller
Accuracy = (185 + 10)/(195 +192) = 0.943 β is, the greater importance is given to recall. Indeed, there
The recalls of classes A and B, by Eqs. (21) and (22b), are should be a trade-off between precision and recall, relying on
the particulars of the classification problem at hand; cf. the
RecallA = 185/195 = 0.949 example of brain surgery in Subsection 3.3.
RecallB = 180/192 = 0.938 Substituting for precision and recall from Eqs. (14) and
(20), respectively, into Eq. (49) , Fβ is formulate as
Therefore, the balanced accuracy, by Eq. (47), is TP
Fβ = (50)
TP+ β(FP)+(1−β)FN
Predicted
A B Note the similarity in form among the expressions of
A 185 10 NA = 195 precision in Eq. (14), recall in Eq. (20), and F β in Eq. (50),
Actual where in the respective denominators, FP is replaced by FN
B 12 180 NB = 192 and both (FP and FN) are replaced by the weighted sum of FP
and FN.
Fig. 27. Confusion matrix with balanced datasets

T10

https://digitalcommons.aaru.edu.jo/erjeng/vol6/iss5/1 10
Fahmy Amin: Confusion Matrix in Binary Classification Problems: A Step-by-Ste
Vol. 6, No. 5 – 2022 Journal of Engineering Research (ERJ)

The special case Precisionmodel is correspondingly the macro-, micro-, or


weighted-average precision of the model, and Recall model is
β = 0.5 (51)
defined in a similar way. We should always take Eqs. (56) and
is of particular interest, where precision and recall are of equal (57) into account when we calculate the average Fβ and F1 for
weight (importance). The Fβ measure under condition (51) is the model. For example, F1 macro is not the arithmetic average
referred to as the F1 score which, by Eq. (49), takes the form of F1A and F1β but it is, by definition, the harmonic average
2 2×Precision×Recall
of Precisionmacro and Recallmacro.
F1 = 1 1 = (52) For the micro-average, Fβmicro reduces to F1micro,
+ Precision+ Recall
Precision Recall
Fβmicro = F1micro (58)
That is, F1 is the harmonic average of precession and
recall. See Fig. 29 or a graphical representation. The distance and the effect of β ceases to exist. In this case,
h is equal to 0.5 F1, and is less than the smaller of precision
and recall. The proof is a simple geometric exercise. Fβmicro = F1micro = Precisionmicro = Recallmicro (59)
which follows in view of Eq. (27). Remember the fact that the
Precision Recall harmonic average of two equal values is the same as either
value.
Aggregating Eqs. (41) and (53) leads to
Accuracy = Precisionmicro = Recallmicro
h
= Recallweighted = Specificitymicro
TP +TP
Half of average harmonic
= Fβmicro = F1micro = 𝐴 𝐵 (60)
N
Fig. 29. Harmonic average of precision and recall and we find out (remarkably) that seven measures are defined.
TP +TN𝐴
by one and the same expression, 𝐴 .
N
Equation (52), in view of Eq. (50) with β =0.5, becomes Example 11
TP
F1 = (53) For the confusion matrix of Fig 15, determine
TP+ 0.5(FP+FN)
(a) F1 score of class A and that of class B.
where the arithmetic average of FP and FN replaces FP in (b) Macro-, micro-, and weighted-average F1 scores of the
Eq.(14) and FN in Eq. (20). classification model.
In Fig. 1, where Precision = 0.909 and Recall = 0.952, Eq. Solution
(49) for β = 0.8 (as an example) and Eq. (52) yield
Collecting results of Examples 5 and 6,
0.952 × 0.909
Fβ=0.8 = = 0.917 PrecisionA = 0.954, PrecisionB = 0.167, RecallA = 0.99,
0.8(0.952) + (1 − 0.8)0.909
RecallB = 0.04, Precisionmacro = 0.561 , Recall macro = 0.515,
2 × 0.909 × 0.952
F1 = = 0.93 Precision micro = Recall micro = Recallweighted = 0.945,
0.909 + 0.952 Precisionweighted = 0.917
The same results are of course produced by the equivalent From Eq. (55),
expressions (50) and (53). 2 × 0.954 × 0.99
F1A = = 0.972
For two classes A and B, we have for class A, 0.954 + 0.99
Precision𝐴 × Recall𝐴 2 × 0.167 × 0.04
FβA = F1B = = 0.065
β(Recall𝐴 ) + (1 − β)Precision𝐴 0.167 + 0.04
TP𝐴 From Eqs. (57) and (59),
= (54)
TP𝐴 + β(FP𝐴 )+(1−β)FN𝐴
2 × Precisionmacro × Recallmacro
F1macro =
2 × Precision𝐴 × Recall𝐴 Precisionmacro + Recallmacro
F1A =
Precision𝐴 +Recall𝐴 2 × 0.561 × 0.515
TP𝐴 = = 0.537
= (55) 0.561 + 0.515
TP𝐴 + 0.5(FP𝐴 +FN𝐴 )
F1micro = Precisionmicro = 0.945
and similar expressions apply to class B. In certain
2 × Precisionweighted × Recallweighted
classification problems, FβA, and FβB as well as F1A, and F1B, F1weighted =
pertaining to the individual classes, can be useful in their own Precisionweighted +Recallweighted
right. 2 × 0.917 × 0.945
For the classification model, we have = = 0.931
0.917 + 0.945
Precisionmodel × Recallmodel
Fβmodel = (56)
)+(1−β)Precision
β(Recallmodel model III. SUMMARY OF RESULTS FOR BINARY
CLASSIFICATION
2×Precisionmodel ×Recallmodel
F1model = (57) Table 1 lists the expressions of the performance measures and
Precisionmodel + Recallmodel
their interrelationships for binary classification with two
Equations (56) and (57) represent the macro-, micro, or classes A and B. The subscript 'class' in rows 5 and 6
weighted-average Fβ and F1 of the model, respectively, where symbolizes either class A or class B, and the subscript ‘model’

T11

Published by Arab Journals Platform, 2022 11


Journal of Engineering Research, Vol. 6 [2022], Iss. 5, Art. 1
Vol. 6, No. 5 – 2022 Journal of Engineering Research (ERJ)

in rows 17 and 16 symbolizes either macro, micro-, or 17


Fβmodel
weighted-average.
Precisionmodel × Recallmodel
=
Table 1. Performance measures for binary classification with two β(Recallmodel ) + (1 − β)Precisionmodel
classes A and B 18
2 × Precisionmodel × Recallmodel
# F1model =
Measure Precisionmodel + Recallmodel
1
TP + TN 19
Specificity𝐴 = Recall𝐵
Accuracy =
N Specificity𝐵 = Recall𝐴
2
TP
Precision = 20
Specificity𝑚𝑎𝑐𝑟𝑜 = Recall𝑚𝑎𝑐𝑟𝑜
TP + FP
3 TP
Recall = (Sensitivity) 21
RecallA + SpecificityB
TP + FN
TN Balanced accuracy =
4
Specificity = 2
TN + FP
5 SpecificityA + SpecificityB
Precisionclass × Recallclass = Recallmacro =
Fβclass = 2
β(Recallclass ) + (1 − β)Precisionclass = Specificitymacro
TPclass
=
TPclass + β(FPclass ) + (1 − β)FNclass 22
Fβmicro = F1micro = Precisionmicro
6
23
2 × Precisionclass × Recallclass
F1class = Accuracy = Precisionmicro = Recallmicro
Precisionclass + Recallclass
TPclass = Recallweighted = Specificitymicro
=
TPclass + 0.5(FPclass + FNclass )
TP𝐴 +TP𝐵
7 = Fβmicro = F1micro =
Recall + Specificity N
Balanced Accuracy =
2
8 Precision𝐴 +Precision𝐵
Precision𝑚𝑎𝑐𝑟𝑜 = Acknowledgement
2
9
Recall𝐴 + Recall𝐵 I would like to Thank Prof. Dr. Amany Sarhan for help
Recall𝑚𝑎𝑐𝑟𝑜 = valuable help in writing and editing this tutorial and
2
10 incorporating the references used in this tutorial, hoping that
Specificity𝐴 + Specificity𝐵
Specificity𝑚𝑎𝑐𝑟𝑜 = this material can help in a deeper understanding of this
2 important topic.
11
TP𝐴 + TP𝐵
Precision𝑚𝑖𝑐𝑟𝑜 = = Accuracy REFERENCES
2
12
Recall𝑚𝑖𝑐𝑟𝑜 = Precision𝑚𝑖𝑐𝑟𝑜 1. David M. W. Powers, "Evaluation: from precision, recall and F-measure
to ROC, informedness, markedness & correlation," Journal of machine
13 learning technologies, volume 2, issue 1, pp-37-63, 2011.
Specificity𝑚𝑖𝑐𝑟𝑜 = Precision𝑚𝑖𝑐𝑟𝑜 2. Hasnae Zerouaouib and Ali Idri, "Deep hybrid architectures for binary
14 classification of medical breast cancer images," Biomedical Signal
Precisionweighted Processing and Control," volume 71, Part B, January 2022.
3. I. H. Witten, E. Frank, M. A. Hall, and C. J. Pal. Data Mining. Morgan
𝑁𝐴 (Precision𝐴 ) + 𝑁𝐴 (Precision𝐵 ) Kaufmann, fourth edition, 2017.
= 4. Kai Ming Ting, "Precision and Recall," Sammut, C., Webb, G.I. (eds)
N Encyclopedia of Machine Learning. Springer, Boston, MA, 2011.
NA NB
(TPA ) + (TPB ) 5. Kanvinde, Nandan, Abhishek Gupta, and Raunak Joshi. "Binary
PA PB classification for high dimensional data using supervised non-parametric
=
N ensemble method." arXiv preprint arXiv:2202.07779, 2022.
6. W. Siblini, J. Fréry, L. He-Guelton, F. Oblé, and Y. Q. Wang, "Master
15
NA (RecallA ) + NA (RecallB )
Recallweighted = Your Metrics with Calibration. In Berthold, M., Feelders, A., and G., K.,
N editors, Advances in Intelligent Data Analysis XVIII. IDA 2020.
Springer, Cham. Lecture Notes in Computer Science, vol. 12080.
TP𝐴 + TP𝐵
= = Accuracy 7. https://www.turing.com/kb/precision-recall-method
𝑁
16
Specificityweighted
NA (SpecificityA ) + NB (SpecificityB )
=
N
NA NB
(TNA ) + (TNB )
NB NA
=
N

T12

https://digitalcommons.aaru.edu.jo/erjeng/vol6/iss5/1 12

You might also like