Understanding the Confusion Matrix in Machine Learning
Last Updated :
30 May, 2025
Confusion matrix is a simple table used to measure how well a classification model is performing. It compares the predictions made by the model with the actual results and shows where the model was right or wrong. This helps you understand where the model is making mistakes so you can improve it. It breaks down the predictions into four categories:
- True Positive (TP): The model correctly predicted a positive outcome i.e the actual outcome was positive.
- True Negative (TN): The model correctly predicted a negative outcome i.e the actual outcome was negative.
- False Positive (FP): The model incorrectly predicted a positive outcome i.e the actual outcome was negative. It is also known as a Type I error.
- False Negative (FN): The model incorrectly predicted a negative outcome i.e the actual outcome was positive. It is also known as a Type II error.
Confusion Matrix It also helps calculate key measures like accuracy, precision and recall which give a better idea of performance especially when the data is imbalanced.
Metrics based on Confusion Matrix Data
1. Accuracy
Accuracy shows how many predictions the model got right out of all the predictions. It gives idea of overall performance but it can be misleading when one class is more dominant over the other. For example a model that predicts the majority class correctly most of the time might have high accuracy but still fail to capture important details about other classes. It can be calculated using the below formula:
\text{Accuracy} = \frac {TP+TN}{TP+TN+FP+FN}
2. Precision
Precision focus on the quality of the model’s positive predictions. It tells us how many of the "positive" predictions were actually correct. It is important in situations where false positives need to be minimized such as detecting spam emails or fraud. The formula of precision is:
\text{Precision} = \frac{TP}{TP+FP}
3. Recall
Recall measures how how good the model is at predicting positives. It shows the proportion of true positives detected out of all the actual positive instances. High recall is essential when missing positive cases has significant consequences like in medical tests.
\text{Recall} = \frac{TP}{TP+FN}
4. F1-Score
F1-score combines precision and recall into a single metric to balance their trade-off. It provides a better sense of a model’s overall performance particularly for imbalanced datasets. It is helpful when both false positives and false negatives are important though it assumes precision and recall are equally important but in some situations one might matter more than the other.
\text{F1-Score} = \frac {2 \cdot Precision \cdot Recall}{Precision + Recall}
5. Specificity
Specificity is another important metric in the evaluation of classification models particularly in binary classification. It measures the ability of a model to correctly identify negative instances. Specificity is also known as the True Negative Rate Formula is given by:
\text{Specificity} = \frac{TN}{TN+FP}
6. Type 1 and Type 2 error
Type 1 and Type 2 error are:
- Type 1 error: It occurs when the model incorrectly predicts a positive instance but the actual instance is negative. This is also known as a false positive. Type 1 Errors affect the precision of a model which measures the accuracy of positive predictions.
\text{Type 1 Error} = \frac{\text{FP}}{\text{FP} + \text{TN}}
- Type 2 error: This occurs when the model fails to predict a positive instance even though it is actually positive. This is also known as a false negative. Type 2 Errors impact the recall of a model which measures how well the model identifies all actual positive cases.
\text{Type 2 Error} = \frac{FN}{TP+FN}
Example: A diagnostic test is used to detect a particular disease in patients.
- Type 1 Error (False Positive): This occurs when the test predicts a patient has the disease (positive result) but the patient is actually healthy (negative case).
- Type 2 Error (False Negative): This occurs when the test predicts the patient is healthy (negative result) but the patient actually has the disease (positive case).
Confusion Matrix For Binary Classification
A 2x2 Confusion matrix is shown below for the image recognition having a Dog image or Not Dog image:
| Predicted | Predicted |
---|
Actual | True Positive (TP) | False Negative (FN) |
---|
Actual | False Positive (FP) | True Negative (TN) |
---|
- True Positive (TP): It is the total counts having both predicted and actual values are Dog.
- True Negative (TN): It is the total counts having both predicted and actual values are Not Dog.
- False Positive (FP): It is the total counts having prediction is Dog while actually Not Dog.
- False Negative (FN): It is the total counts having prediction is Not Dog while actually, it is Dog.
Example: Confusion Matrix for Dog Image Recognition with Numbers
Index | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
---|
Actual | Dog | Dog | Dog | Not Dog | Dog | Not Dog | Dog | Dog | Not Dog | Not Dog |
---|
Predicted | Dog | Not Dog | Dog | Not Dog | Dog | Dog | Dog | Dog | Not Dog | Not Dog |
---|
Result | TP | FN | TP | TN | TP | FP | TP | TP | TN | TN |
---|
- Actual Dog Counts = 6
- Actual Not Dog Counts = 4
- True Positive Counts = 5
- False Positive Counts = 1
- True Negative Counts = 3
- False Negative Counts = 1
| Predicted |
---|
Dog | Not Dog |
---|
Actual | Dog | True Positive (TP =5) | False Negative (FN =1) |
---|
Not Dog | False Positive (FP=1) | True Negative (TN=3) |
---|
Implementation of Confusion Matrix for Binary classification using Python
Step 1: Import the necessary libraries
Python
import numpy as np
from sklearn.metrics import confusion_matrix,classification_report
import seaborn as sns
import matplotlib.pyplot as plt
Step 2: Create the NumPy array for actual and predicted labels
- actual: represents the true labels or the actual classification of the items. In this case it's a list of 10 items where each entry is either 'Dog' or 'Not Dog'.
- predicted: represents the predicted labels or the classification made by the model.
Python
actual = np.array(
['Dog','Dog','Dog','Not Dog','Dog','Not Dog','Dog','Dog','Not Dog','Not Dog'])
predicted = np.array(
['Dog','Not Dog','Dog','Not Dog','Dog','Dog','Dog','Dog','Not Dog','Not Dog'])
Step 3: Compute the confusion matrix
- confusion_matrix: This function from sklearn.metrics computes the confusion matrix which is a table used to evaluate the performance of a classification algorithm. It compares actual and predicted to generate a matrix
Python
cm = confusion_matrix(actual,predicted)
Step 4: Plot the confusion matrix with the help of the seaborn heatmap
- sns.heatmap: This function from Seaborn is used to create a heatmap of the confusion matrix.
- annot=True: Display the numerical values in each cell of the heatmap.
Python
sns.heatmap(cm,
annot=True,
fmt='g',
xticklabels=['Dog','Not Dog'],
yticklabels=['Dog','Not Dog'])
plt.ylabel('Actual', fontsize=13)
plt.title('Confusion Matrix', fontsize=17, pad=20)
plt.gca().xaxis.set_label_position('top')
plt.xlabel('Prediction', fontsize=13)
plt.gca().xaxis.tick_top()
plt.gca().figure.subplots_adjust(bottom=0.2)
plt.gca().figure.text(0.5, 0.05, 'Prediction', ha='center', fontsize=13)
plt.show()
Output:
Visualizing the Confusion MatrixStep 5: Classifications Report based on Confusion Metrics
Python
print(classification_report(actual, predicted))
Output:
Classification ReportConfusion Matrix For Multi-class Classification
In multi-class classification the confusion matrix is expanded to account for multiple classes.
- Rows represent the actual classes (ground truth).
- Columns represent the predicted classes.
- Each cell in the matrix shows how often a specific actual class was predicted as another class.
For example in a 3-class problem the confusion matrix would be a 3x3 table where each row and column corresponds to one of the classes. It summarizes the model's performance across all classes in a compact format. Lets consider the below example:
Example: Confusion Matrix for Image Classification (Cat, Dog, Horse)
| Predicted Cat | Predicted Dog | Predicted Horse |
---|
Actual Cat | True Positive (TP) | False Negative (FN) | False Negative (FN) |
---|
Actual Dog | False Negative (FN) | True Positive (TP) | False Negative (FN) |
---|
Actual Horse | False Negative (FN) | False Negative (FN) | True Positive (TP) |
---|
The definitions of all the terms (TP, TN, FP and FN) are the same as described in the previous example.
Example with Numbers:
Let's consider the scenario where the model processed 30 images:
| Predicted Cat | Predicted Dog | Predicted Horse |
---|
Actual Cat | 8 | 1 | 1 |
---|
Actual Dog | 2 | 10 | 0 |
---|
Actual Horse | 0 | 2 | 8 |
---|
In this scenario:
- Cats: 8 were correctly identified, 1 was misidentified as a dog and 1 was misidentified as a horse.
- Dogs: 10 were correctly identified, 2 were misidentified as cats.
- Horses: 8 were correctly identified, 2 were misidentified as dogs.
To calculate true negatives, we need to know the total number of images that were NOT cats, dogs or horses. Let's assume there were 10 such images and the model correctly classified all of them as "not cat," "not dog," and "not horse." Therefore:
- True Negative (TN) Counts: 10 for each class as the model correctly identified each non-cat/dog/horse image as not belonging to that class
Implementation of Confusion Matrix for Multi-Class classification using Python
Step 1: Import the necessary libraries
Python
import numpy as np
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay, classification_report
import matplotlib.pyplot as plt
Step 2: Create the NumPy array for actual and predicted labels
- y_true: List of true labels.
- y_pred: List of predicted labels by the model.
- classes: A list of class names: 'Cat', 'Dog' and 'Horse'
Python
y_true = ['Cat'] * 10 + ['Dog'] * 12 + ['Horse'] * 10
y_pred = ['Cat'] * 8 + ['Dog'] + ['Horse'] + ['Cat'] * 2 + ['Dog'] * 10 + ['Horse'] * 8 + ['Dog'] * 2
classes = ['Cat', 'Dog', 'Horse']
Step 3: Generate and Visualize the Confusion Matrix
- ConfusionMatrixDisplay: Creates a display object for the confusion matrix.
- confusion_matrix=cm: Passes the confusion matrix (
cm
) to display. - display_labels=classes: Sets the labels (['Cat' , 'Dog' , 'Horse']) or the confusion matrix.
Python
cm = confusion_matrix(y_true, y_pred, labels=classes)
disp = ConfusionMatrixDisplay(confusion_matrix=cm, display_labels=classes)
disp.plot(cmap=plt.cm.Blues)
plt.title('Confusion Matrix', fontsize=15, pad=20)
plt.xlabel('Prediction', fontsize=11)
plt.ylabel('Actual', fontsize=11)
plt.gca().xaxis.set_label_position('top')
plt.gca().xaxis.tick_top()
plt.gca().figure.subplots_adjust(bottom=0.2)
plt.gca().figure.text(0.5, 0.05, 'Prediction', ha='center', fontsize=13)
plt.show()
Output:
Display the confusion matrixStep 4: Print the Classification Report
Python
print(classification_report(y_true, y_pred, target_names=classes))
Output:
Classification ReportConfusion matrix provides clear insights into important metrics like accuracy, precision and recall by analyzing correct and incorrect predictions.
Similar Reads
Machine Learning Algorithms
Machine learning algorithms are essentially sets of instructions that allow computers to learn from data, make predictions, and improve their performance over time without being explicitly programmed. Machine learning algorithms are broadly categorized into three types: Supervised Learning: Algorith
8 min read
Top 15 Machine Learning Algorithms Every Data Scientist Should Know in 2025
Machine Learning (ML) Algorithms are the backbone of everything from Netflix recommendations to fraud detection in financial institutions. These algorithms form the core of intelligent systems, empowering organizations to analyze patterns, predict outcomes, and automate decision-making processes. Wi
14 min read
Linear Model Regression
Ordinary Least Squares (OLS) using statsmodels
Ordinary Least Squares (OLS) is a widely used statistical method for estimating the parameters of a linear regression model. It minimizes the sum of squared residuals between observed and predicted values. In this article we will learn how to implement Ordinary Least Squares (OLS) regression using P
3 min read
Linear Regression (Python Implementation)
Linear regression is a statistical method that is used to predict a continuous dependent variable i.e target variable based on one or more independent variables. This technique assumes a linear relationship between the dependent and independent variables which means the dependent variable changes pr
14 min read
Multiple Linear Regression using Python - ML
Linear regression is a statistical method used for predictive analysis. It models the relationship between a dependent variable and a single independent variable by fitting a linear equation to the data. Multiple Linear Regression extends this concept by modelling the relationship between a dependen
4 min read
Polynomial Regression ( From Scratch using Python )
Prerequisites Linear RegressionGradient DescentIntroductionLinear Regression finds the correlation between the dependent variable ( or target variable ) and independent variables ( or features ). In short, it is a linear model to fit the data linearly. But it fails to fit and catch the pattern in no
5 min read
Bayesian Linear Regression
Linear regression is based on the assumption that the underlying data is normally distributed and that all relevant predictor variables have a linear relationship with the outcome. But In the real world, this is not always possible, it will follows these assumptions, Bayesian regression could be the
10 min read
How to Perform Quantile Regression in Python
In this article, we are going to see how to perform quantile regression in Python. Linear regression is defined as the statistical method that constructs a relationship between a dependent variable and an independent variable as per the given set of variables. While performing linear regression we a
4 min read
Isotonic Regression in Scikit Learn
Isotonic regression is a regression technique in which the predictor variable is monotonically related to the target variable. This means that as the value of the predictor variable increases, the value of the target variable either increases or decreases in a consistent, non-oscillating manner. Mat
6 min read
Stepwise Regression in Python
Stepwise regression is a method of fitting a regression model by iteratively adding or removing variables. It is used to build a model that is accurate and parsimonious, meaning that it has the smallest number of variables that can explain the data. There are two main types of stepwise regression: F
6 min read
Least Angle Regression (LARS)
Regression is a supervised machine learning task that can predict continuous values (real numbers), as compared to classification, that can predict categorical or discrete values. Before we begin, if you are a beginner, I highly recommend this article. Least Angle Regression (LARS) is an algorithm u
3 min read
Linear Model Classification
K-Nearest Neighbors (KNN)
ML - Stochastic Gradient Descent (SGD)
Stochastic Gradient Descent (SGD) is an optimization algorithm in machine learning, particularly when dealing with large datasets. It is a variant of the traditional gradient descent algorithm but offers several advantages in terms of efficiency and scalability, making it the go-to method for many d
8 min read