Confusion Matrix in Binary Classification Problems - A Step-By-Ste
Confusion Matrix in Binary Classification Problems - A Step-By-Ste
2022
Recommended Citation
Fahmy Amin, Mahmoud (2022) "Confusion Matrix in Binary Classification Problems: A Step-by-Step
Tutorial," Journal of Engineering Research: Vol. 6: Iss. 5, Article 1.
Available at: https://digitalcommons.aaru.edu.jo/erjeng/vol6/iss5/1
This Article is brought to you for free and open access by Arab Journals Platform. It has been accepted for
inclusion in Journal of Engineering Research by an authorized editor. The journal is hosted on Digital Commons, an
Elsevier platform. For more information, please contact [email protected], [email protected],
[email protected].
Fahmy Amin: Confusion Matrix in Binary Classification Problems: A Step-by-Ste
Vol. 6, No. 5 – 2022 Journal of Engineering Research (ERJ)
Abstract: In the field of machine learning, the confusion matrix (=5 + 90) times (regardless whether the predictions are
is a specific table adopted to describe and assess the performance correct or not).
of a classification model (e.g. an artificial neural network) for a In actuality, 105 (= 100 +5) persons in the test set have
set of test data whose actual distinguishing features are known.
the disease and 100 (= 10 + 90) persons do not.
The learning algorithm is thus of the supervised learning
category. For an n-class classification problem, the confusion More conclusive information can be drawn from the
matrix is square with n rows and n columns. The rows represent confusion matrix, as elucidated below.
the class actual samples (instances) which are the inputs to the
classifier, while the columns represent the class predicted
A. Building blocks: TP, TN, FP, and FN
samples, the classifier outputs. (The converse is also valid, i.e. the Formally, a comparison of the actual classifications with
two dimensions 'actual' and 'predicted' can be assigned to
the predicted classifications reveals that four well-defined
columns and rows, respectively). Binary as well as multiple-class
classifiers can be dealt with. It is worth noting that the term outcomes emerge:
'matrix' here has nothing to do with the theorems of matrix The actual classification is positive and the predicted
algebra; it is regarded just as an information-conveying table. classification is positive. This outcome is referred to as
The descriptive word ‘confusion’ stems from the fact that the ‘true positive’, abbreviated TP, because the positive
matrix clarifies to what extent the model confuses the classes — sample is correctly identified by the classifier.
mislabels one as another. The essential concept was introduced The actual classification is negative and the predicted
in 1904 by the British statistician Karl Pearson (1857 — 1936).
classification is negative. This is a “true negative” (TN)
Keywords— Machine Learning; Confusion matrix; outcome because the negative sample is correctly
Accuracy; Recall; Specificity; Precision; True Negative; False identified by the classifier.
Positive, Balanced Accuracy. The actual classification is negative and the predicted
1. BINARY CLASSIFICATION classification is positive. This is a 'false positive' (FP)
outcome because the negative sample is incorrectly
We begin with the basic and relatively simple situation of identified by the classifier as positive.
a binary classifier, where we have two classes (n= 2) and a The actual classification is positive and the predicted
2x2 confusion matrix. See Fig. 1. Let the matrix in this figure,
classification is negative. This is a ‘false negative' (FN)
as an illustrative example, belong to a medical test conducted
outcome because the positive sample is incorrectly
on a number of persons (patients) for the presence or absence
identified by the classifier as negative.
of a certain disease. The labels 'positive' (+ve) and ‘negative’
These four outcomes, with the above interpretation,
(-ve) are used to identify these two distinct cases, respectively,
pertain in fact to the positive class, provided this class is
which are treated as two classes in a classification problem.
particularly important and deserves emphasis; it
(Other labels such as ‘1’ and ‘0’, 'yes' and ‘no’, or ‘event’ and
accommodates what can be called ‘relevant’ samples, while
‘not event’ can likewise be used). With such labeling,
the negative class is regarded as ‘irrelevant’.
attention is sometimes focused on the positive class, and its
The outcomes TP, TN, FP, and FN are of prime
classification outcomes are considered the decisive
significance and are termed the ‘building blocks’, since they
characteristics of the classifier.
are employed to formulate all performance measures as will
Predicted be evident in Section 3.
The building blocks appear naturally as the elements of the
+ve -ve
confusion matrix, as shown in Figs. 2 and 3. Note that the true
+ve 100 5 outcomes TP and TN occupy the two diagonal cells of the
Actual -ve 10 90 matrix. The false outcomes FP and FN occupying the two off-
diagonal cells imply errors; FP is a type I error and FN is a
type II error. In our example of ill and healthy persons, FP
Fig. 1. Confusion matrix for binary classifier
represents persons who are healthy and classified as ill while,
The confusion matrix of Fig. 1 tells us that: on the contrary, FN represents persons who are ill and
The label 'positive' means the person has the disease, classified as healthy. The latter case (type II error) is normally
and the label 'negative' means the person does not. more dangerous than the former (type I error).
A total of 205 (= 100 +5+ 10+ 90) persons were tested. Returning to Fig. 1, the building blocks are seen to be TP
Out of the 205 persons, the classifier predicted as = 100, TN= 90, FP =10, FN =5. Ideally, FP and FN would
‘positive’ 110 (= 100+ 10) times and as ‘negative’ 95 both be of zero values, representing a perfect classifier.
T1
Predicted Solution
+ve -ve
The classification situation is illustrated in Fig. 5. Labels
+ve TP FN ‘1' and ‘0' for the two classes corresponding to 'positive' and
Actual -ve FP TN 'negative', respectively. From Fig. 5, the building blocks for
class 1 are
Fig. 2. Building blocks of positive class as elements of 2X2 confusion
matrix TP = 6, TN = 3, FP = 1, FN = 2
The confusion matrix of the classifier is shown in Fig. 6.
Predicted Type I error
+ve -ve Predicted
+ve TP FN 1 0
Actual -ve FP TN 1 6 2
Actual
0 1 3
Type II error
Fig. 3. True outcomes in diagonal cells and false outco mes in off-
diagonal cells Fig. 6. Confusion matrix for Example 1
Person's number 1 2 3 4 5 6 7 8 9 10 11 12
Actual 1 1 1 1 1 1 1 1 0 0 0 0
classification
Predicted 0 0 1 1 1 1 1 1 1 0 0 0
classification
Outcome FN FN TP TP TP TP TP TP FP TN TN TN
T2
https://digitalcommons.aaru.edu.jo/erjeng/vol6/iss5/1 2
Fahmy Amin: Confusion Matrix in Binary Classification Problems: A Step-by-Ste
Vol. 6, No. 5 – 2022 Journal of Engineering Research (ERJ)
Predicted Predicted
A B B A
A TPA = TNB FNA = FPB B TPB = TNA FNB = FPA
Actual Actual
B FPA = FNB TNA = TPB A FPB = FNA TNB = TPA
(a) (b)
Fig. 10. Two forms for confusion matrix of binary classifier through interchange of classes
T3
T4
https://digitalcommons.aaru.edu.jo/erjeng/vol6/iss5/1 4
Fahmy Amin: Confusion Matrix in Binary Classification Problems: A Step-by-Ste
Vol. 6, No. 5 – 2022 Journal of Engineering Research (ERJ)
100
correctly classified out of 1000 samples with a percentage as Precision = = 0.909 (90.9 %)
100+10
high as 99% (taken as TPA/NA), only two class-B samples are
correctly classified out of 50 samples with a very low In an ideal case when FP = 0, the precision reaches its
percentage of 4% (TPB/NB). These results warn us that the maximum value of 1.0. This means that all samples predicted
94.5 % accuracy cannot be relied upon; it deceptively as positive actually belong to the positive class (TP = P +); the
describes the classification reliability of individual classes type I error is of zero value. See Fig. 17. Strictly speaking,
with dataset imbalance. One other measure, called balanced expression (14) is the precision of the positive class.
accuracy, will be specified in Subsection 3.5 for imbalanced Predicted
datasets. +ve -ve
+ve TP = P+ FN
Example 4 Actual
-ve FP = 0 TN
A binary classification model is used for two classes A and P+ P_
B. The number of samples classified as class A is 169 and the
Fig. 17. Maximum precision
number of samples classified as class B is 157. The type I and
type II errors are recorded to be 39 samples and 46 samples, For two classes A and B, we write
respectively. TP𝐴 TP𝐴
(a) What is the percentage of correctly classified Precision𝐴 = = (15)
𝑃𝐴 TP𝐴 + FP𝐴
samples in class A? class B? TP𝐵 TP𝐵
Precision𝐵 = = (16a)
(b) Determine the accuracy of the model. 𝑃𝐵 TP𝐵 + FP𝐵
T5
i.e. the arithmetic average (mean) of the two precisions, with In Fig. 1, where TP =100 and FN =5,
equal weights of unity.
The micro-average precision is Recall = 100/(100+5) = 0.952 (95.2 %)
TP𝐴 +TP𝐵 In an ideal case when FN=0, the recall attains its maximum
Precision𝑚𝑖𝑐𝑟𝑜 = (18a)
TP𝐴 +TP𝐵 +FP𝐴 +FP𝐵 value of 1.0, meaning that all actual samples of the positive
class are correctly classified (TP=N+), with zero type II error.
where the true positives and false positives for class A are
See Fig. 20. Specifically, expression (20) is the recall of the
amalgamated with their counterparts for class B. Since the
positive class.
four-term sum in the denominator of expression (18a) is equal Predicted
to N, we can also write
+ve -ve
TP𝐴 +TP𝐵
Precision𝑚𝑖𝑐𝑟𝑜 = (18b) +ve TP = N+ FN=0 N+
𝑁
Actual
It is to be noted in the meantime that expression (18b) is -ve FP TN N-
the same as the model accuracy defined in (10), and thus Fig. 20. Maximum recall
Precisionmicro = Accuracy (18c) For two classes A anal B, we write
The weighted-average precision is TP𝐴 TP𝐴
Recall𝐴 = = (21)
NA (PrecisionA )+NB (PrecisionB ) 𝑁𝐴 TP𝐴 + FN𝐴
Precisionweighted = (19a) TP𝐵 TP𝐵
N
Recall𝐵 = = (22a)
𝑁𝐵 TP𝐵 + FN𝐵
or, by Eqs. (15) and (16a),
NA N See Fig. 21. In expression (21), two class-A building
(TPA )+ B (TPB )
Precisionweighted =
PA PB
(19b) blocks TPA and FNA (first row in Fig. 21a) are used and,
N similarly, two class-B building blocks TPB and FNB (first row
where PrecisionA and PrecisionB are weighted by NA and NB, in Fig. 21b) are used in expression (22a). The recall of a
respectively. certain class is thus the ratio of the number of samples of the
In Fig. 8, classes A and B have balanced datasets. Since class correctly classified as belonging to this class (true
PrecisionA = 0.843 and PrecisionB = 0.884, positives) to the number of all actual samples of the same class
0.843 +0.884
(true positives plus false negatives ).
Precision𝑚𝑎𝑐𝑟𝑜 = = 0.864 By relationships (6), RecallB in (22a) can also be
2
expressed in terms of two class-A building blocks TNA and
Since TPA = 215, TPB = 190, and N= 470, FPA (second row in Fig. 21a) as
215 + 190
Precision𝑚𝑖𝑐𝑟𝑜 = = 0.862 (= Accuracy) TN𝐴
470
Recall𝐵 = (22b)
TN𝐴 + FP𝐴
Since NA = 240 and NB = 230,
T6
https://digitalcommons.aaru.edu.jo/erjeng/vol6/iss5/1 6
Fahmy Amin: Confusion Matrix in Binary Classification Problems: A Step-by-Ste
Vol. 6, No. 5 – 2022 Journal of Engineering Research (ERJ)
TP𝐴 Predicted
Recall𝐴 =
TP𝐴 + FN𝐴 A B Predicted
A TPA FNA NA Removed Not removed
Actual Tumour TP FN=0
B FPA TNA NB
Actual
PA PB Healthy FP=ϵ1 TN
T7
It is also evident from Eqs. (24) and (18b) that Using definitions (30) and (31) and results of Example 6,
we obtain for class A,
Recall𝑚𝑖𝑐𝑟𝑜 = Precision 𝑚𝑖𝑐𝑟𝑜 (27)
TPRA = RecallA = 0.99
and moreover by Eq. (8c),
FNRA = 1 - TPRA = 1 - 0.99 = 0.01
Recall𝑚𝑖𝑐𝑟𝑜 = Accuracy (28)
and for class B,
Combining Eqs. (26), (27), and (28), we can write
TPRB = RecallB = 0.04
Accuracy = Precision 𝑚𝑖𝑐𝑟𝑜 = Recall𝑚𝑖𝑐𝑟𝑜 = FNRB = 1- TPRB = 1 - 0.04 = 0.96
Recall𝑤𝑒𝑖𝑔ℎ𝑡𝑒𝑑 (29) For the classification model,
In Fig. 8, since RecallA = 0.896 and RecallB = 0.826, TPRmacro = Recallmacro = 0.515
0.896 + 0.826
Recall𝑚𝑎𝑐𝑟𝑜 = = 0.861 FNRmacro = 1 - TPRmacro = 1 - 0.515 = 0.485
2
Since TPA = 215, TPB = 190, and N = 470, D. Specificity
215 + 190 The specificity is the ratio of the number of samples
Recall𝑚𝑖𝑐𝑟𝑜 = = 0.862 correctly classified as negative to the number of all actual
470
negative samples. From the second row of Fig. 2, we have
Since NA = 240 and NB = 230,
TN TN
240(0.896) + 230(0.826) Specificity = = (32)
N− TN + FP
Recall𝑤𝑒𝑖𝑔ℎ𝑡𝑒𝑑 = = 0.862
470
In Fig.1, where TN =90 and FP = 10,
Equation (29) is already satisfied, where
Specificity = 90/(90 + 10) = 0.9
Accuracy = Precision 𝑚𝑖𝑐𝑟𝑜 = Recall𝑚𝑖𝑐𝑟𝑜 =
Recall𝑤𝑒𝑖𝑔ℎ𝑡𝑒𝑑 = 0.862 In an ideal case when FP = 0 (zero type I error), the
specificity has its maximum value of 1.0, meaning that all
Example 6
actual samples of the negative class are correctly classified
For the confusion matrix of Fig. 15, determine the macro-, (TN = N_). Remember that the same condition FP = 0, Fig.
micro, and weighted-average recalls of the classification 17, makes the precision also at its maximum value of 1.0. See
model. Fig. 24. Expression (32) is in fact the specificity of the positive
Solution class.
For two classes A and B,
From Eqs. (21) and (22a),
TN𝐴 TN𝐴
RecallA = 990/(990 + 10) = 0.99 Specificity𝐴 = = (33)
N𝐵 TN𝐴 + FP𝐴
T8
https://digitalcommons.aaru.edu.jo/erjeng/vol6/iss5/1 8
Fahmy Amin: Confusion Matrix in Binary Classification Problems: A Step-by-Ste
Vol. 6, No. 5 – 2022 Journal of Engineering Research (ERJ)
(a) Class A
For the confusion matrix of Fig. 15, determine the macro-,
micro-, and weighted-average specificities of the
Predicted TN𝐵 classification model.
Specificity𝐵 =
B A TN𝐵 + FP𝐵
Solution
B TPB FNB NB
Using results of Example 6, we obtain
Actual
A FPB TNB NA
SpecificityA = RecallB = 0.04
(b) Class B SpecificityB = RecallA = 0.99
Fig. 25. Specificities of classes A and B as obtained from two forms of Specificitymacro = Recallmacro = 0.515
confusion matrix
Specificitymicro = Recallmicro = 0.945
TP𝐴 Predicted
Specificity𝐵 = From Eq. (42),
TP𝐴 + FN𝐴 A B
A TPA FNA TN𝐴 Specificityweighted = [1000(0.04) + 50(0.99)]/1050 = 0.085
Specificity𝐴 =
Actual TN𝐴 + FP𝐴 In addition to TPR and FNR expressed along with recall
B FPA TNA
at the end of Subsection 3.3, we here define TNR (true
Fig. 26. Specificities of classes A and B as obtained from one form of negative rate) and FPR (false positive rate). The TNR is the
confusion matrix same thing as specificity, Eq. (32),
By comparison, it is clear that Eqs. (33) and (22b) are TN T𝑁
TNR = = = Specificity (43)
identical and so are Eqs. (34b) and (21), providing the results N− TN + FP
T9
Example 10 which is considerably less than the value of accuracy and may
thus be reliable. The difference in the values of accuracy and
For the confusion matrix of Fig. 15, determine the balanced accuracy is due to the imbalance of datasets.
balanced accuracy of the classification model.
Solution F. Fβ measure and F1 score
Using the value of the macro-average recall, or the macro- The precision and recall are commonly combined to
average specificity, in the solution of Example 8, Eq. (48) provide a performance measure called Fβ measure, defined as
yields 1 Recall×Precision
Balanced accuracy = 0.515 Fβ = β 1−β = (49)
+ β(Recall)+(1−β)Precision
Precision Recall
A comparison between balanced accuracy and accuracy is This means that Fβ is the weighted harmonic average of
in order. Consider a binary classifier with the confusion precision and recall. Here, B is a positive fractional factor, 0
matrix of Fig. 27, where the datasets of classes A and B are < β <1, which reflects the importance of precision and recall
balanced (NA = 195, NB = 192). The accuracy, by Eq. (10a), with respect to each other. The greater β is, the greater
is importance is given to precision and, conversely, the smaller
Accuracy = (185 + 10)/(195 +192) = 0.943 β is, the greater importance is given to recall. Indeed, there
The recalls of classes A and B, by Eqs. (21) and (22b), are should be a trade-off between precision and recall, relying on
the particulars of the classification problem at hand; cf. the
RecallA = 185/195 = 0.949 example of brain surgery in Subsection 3.3.
RecallB = 180/192 = 0.938 Substituting for precision and recall from Eqs. (14) and
(20), respectively, into Eq. (49) , Fβ is formulate as
Therefore, the balanced accuracy, by Eq. (47), is TP
Fβ = (50)
TP+ β(FP)+(1−β)FN
Predicted
A B Note the similarity in form among the expressions of
A 185 10 NA = 195 precision in Eq. (14), recall in Eq. (20), and F β in Eq. (50),
Actual where in the respective denominators, FP is replaced by FN
B 12 180 NB = 192 and both (FP and FN) are replaced by the weighted sum of FP
and FN.
Fig. 27. Confusion matrix with balanced datasets
T10
https://digitalcommons.aaru.edu.jo/erjeng/vol6/iss5/1 10
Fahmy Amin: Confusion Matrix in Binary Classification Problems: A Step-by-Ste
Vol. 6, No. 5 – 2022 Journal of Engineering Research (ERJ)
T11
T12
https://digitalcommons.aaru.edu.jo/erjeng/vol6/iss5/1 12