Computing Classification Evaluation Metrics in R
Last Updated :
25 Apr, 2025
Classification evaluation metrics help us understand how well a model performs in assigning instances to predefined categories. These metrics provide both general and class-specific insights, guiding us in tuning models and interpreting their effectiveness.
Confusion Matrix
The confusion matrix summarizes a model's prediction results in a tabular format. Rows represent actual classes and columns show predicted classes. Each cell contains the count of predictions for a particular actual/predicted combination, helping us identify errors like false positives and false negatives. From the matrix, we derive core metrics:
- True Positives (TP): Correctly predicted positives
- True Negatives (TN): Correctly predicted negatives
- False Positives (FP): Incorrectly predicted positives
- False Negatives (FN): Incorrectly predicted negatives
1. Accuracy
Formula: (TP + TN) / (TP + TN + FP + FN)
Accuracy measures the overall correctness of a model's predictions. It’s easy to interpret but can be misleading on imbalanced datasets. A model predicting only the majority class may still show high accuracy, even if it fails to capture minority classes. Accuracy is more reliable when class distribution is balanced and all types of errors are equally important.
2. Precision
Formula: TP / (TP + FP)
Precision focuses on the quality of positive predictions. It tells us how many of the predicted positives are actually correct. This is especially useful when false positives are costly—like in fraud detection or medical diagnoses, where incorrect alerts can cause significant disruptions or unnecessary treatments.
3. Recall (Sensitivity)
Formula: TP / (TP + FN)
Recall measures the model’s ability to detect all actual positives. It's critical in cases where missing a positive case (false negative) is riskier than a false alarm—like detecting diseases or rare faults in systems. High recall means fewer important cases are missed.
4. F1 Score
Formula: 2 × (Precision × Recall) / (Precision + Recall)
The F1 score balances precision and recall into a single metric, which is especially helpful when dealing with class imbalance. It's useful when both false positives and false negatives are important and neither precision nor recall alone gives the full picture.
5. Specificity
Formula: TN / (TN + FP)
Specificity or true negative rate, tells us how well a model identifies actual negatives. It’s key in scenarios where false positives are costly—such as medical screenings, where we want to minimize mislabeling healthy individuals as sick.
6. Kappa Score
Formula: (Observed Accuracy - Expected Accuracy) / (1 - Expected Accuracy)
Kappa measures agreement between predicted and actual labels, adjusted for chance. It's particularly useful for imbalanced datasets, giving a more reliable view of model performance than accuracy alone. A score of 1 indicates perfect agreement, 0 means chance-level performance and negative values suggest worse-than-random predictions.Evaluation metrics in R
Implementation of Evaluation Metrics in R
We will use the "iris" dataset for the programming example. It contains three categories of 50 instances each. We will use the random forest to classify the data into categories and then print the above-mentioned evaluation metrics in R.
Step 1: Loading the necessary package
We will nstall the caret package, which stands for Classification And Rrgression Training. The "caret" package is a comprehensive framework in R for performing machine learning tasks, including data preprocessing, model training and evaluation. Load the "caret" package into the R environment so that its functions and capabilities can be used.
R
install.packages("caret")
library(caret)
Step 2: Loading Dataset
For this example, we use the "iris" dataset. Load the "iris" dataset, which is a famous dataset included in R. It contains measurements of different features of iris flowers (sepal length, sepal width, petal length, petal width) along with their corresponding species (setosa, versicolor, virginica).
Print a summary of the "iris" dataset, providing descriptive statistics such as mean, median, minimum, maximum and quartiles for each feature.
R
Output:
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
Min. :4.300 Min. :2.000 Min. :1.000 Min. :0.100 setosa :50
1st Qu.:5.100 1st Qu.:2.800 1st Qu.:1.600 1st Qu.:0.300 versicolor:50
Median :5.800 Median :3.000 Median :4.350 Median :1.300 virginica :50
Mean :5.843 Mean :3.057 Mean :3.758 Mean :1.199
3rd Qu.:6.400 3rd Qu.:3.300 3rd Qu.:5.100 3rd Qu.:1.800
Max. :7.900 Max. :4.400 Max. :6.900 Max. :2.500
This code block creates a scatter plot matrix of the "iris" dataset. It shows pairwise scatter plots of each feature against other features, allowing us to visualize the relationships and distributions between variables.
R
Output:
Pair plotStep 3: Splitting the Dataset
We will split the dataset into training and testing sets using createDataPartition
from the caret
package. We reserve 80% of the data for training. We are creating two subsets: trainData
to build the model and testData
to evaluate it.
R
trainIndex <- createDataPartition(iris$Species, p = 0.8, list = FALSE)
trainData <- iris[trainIndex, ]
testData <- iris[-trainIndex, ]
Step 4: Model training
We train a classification model using the Random Forest algorithm with train()
from caret
. The formula Species ~ .
tells R to use all other columns to predict the Species
.
R
install.packages("randomForest")
library(randomForest)
model <- train(Species ~ ., data = trainData, method = "rf")
Step 5: Evaluating Metrics
We make predictions on the test data and use confusionMatrix()
to compute accuracy, precision, recall, F1 score and more.We are comparing the predicted class labels with the actual values in the test set. The result is stored in cm
, which contains detailed performance metrics.
R
predictions <- predict(model, newdata = testData)
cm<-confusionMatrix(predictions, testData$Species)
cm
Output:
Confusion Matrix and Statistics
Reference
Prediction setosa versicolor virginica
setosa 10 0 0
versicolor 0 10 1
virginica 0 0 9
Overall Statistics
Accuracy : 0.9667
95% CI : (0.8278, 0.9992)
No Information Rate : 0.3333
P-Value [Acc > NIR] : 2.963e-13
Kappa : 0.95
Mcnemar's Test P-Value : NA
Statistics by Class:
Class: setosa Class: versicolor Class: virginica
Sensitivity 1.0000 1.0000 0.9000
Specificity 1.0000 0.9500 1.0000
Pos Pred Value 1.0000 0.9091 1.0000
Neg Pred Value 1.0000 1.0000 0.9524
Prevalence 0.3333 0.3333 0.3333
Detection Rate 0.3333 0.3333 0.3000
Detection Prevalence 0.3333 0.3667 0.3000
Balanced Accuracy 1.0000 0.9750 0.9500
Now the confusion matrix function results are stored in variable cm. We can access the accuracy, kappa, etc scores by using the argument overall to the variable cm. The overall metrics the measure the correctness or any other evaluation by using whole data and not separated by class i.e. it considers the whole result and not restricted by categories or not influenced by classes are shown in it.
To see the overall evaluation metrics use $overall and to see the classwise evaluation metrics use $byClass
R
Output:
Sensitivity Specificity Pos Pred Value Neg Pred Value Precision
Class: setosa 1.0 1.00 1.0000000 1.000000 1.0000000
Class: versicolor 1.0 0.95 0.9090909 1.000000 0.9090909
Class: virginica 0.9 1.00 1.0000000 0.952381 1.0000000
Recall F1 Prevalence Detection Rate Detection Prevalence
Class: setosa 1.0 1.0000000 0.3333333 0.3333333 0.3333333
Class: versicolor 1.0 0.9523810 0.3333333 0.3333333 0.3666667
Class: virginica 0.9 0.9473684 0.3333333 0.3000000 0.3000000
Balanced Accuracy
Class: setosa 1.000
Class: versicolor 0.975
Class: virginica 0.950
We get all the classwise evaluation scores. From this, we can observe the sensitivity, Specificity, Precision, Recall and F1 Score.
R
Output:
Accuracy Kappa AccuracyLower AccuracyUpper AccuracyNull
9.666667e-01 9.500000e-01 8.278305e-01 9.991564e-01 3.333333e-01
AccuracyPValue McnemarPValue
2.962731e-13 NaN
The cm$byClass
line retrieves the performance metrics (such as accuracy, precision, recall and F1 score) for each class separately.
The cm$overall
the line provides overall performance metrics across all classes.
In this article we use the caret
package, to load data, train a model, make predictions and calculate a full set of classification evaluation metrics.
Similar Reads
CatBoost Metrics for model evaluation To make sure our model's performance satisfies evolving expectations and criteria, proper evaluation is crucial when it comes to machine learning model construction. Yandex's CatBoost is a potent gradient-boosting library that gives machine learning practitioners and data scientists a toolbox of mea
15+ min read
Grade Classification Based on Multiple Conditions in R Grade classification based on multiple conditions is a common task in data analysis, often performed using programming languages like R. This process involves assigning grades or labels to data points based on predefined criteria or conditions. In this context, we'll explore how to classify data int
7 min read
Clustering Performance Evaluation in Scikit Learn In this article, we shall look at different approaches to evaluate Clustering Algorithms using Scikit Learn Python Machine Learning Library. Clustering is an Unsupervised Machine Learning algorithm that deals with grouping the dataset to its similar kind data point. Clustering is widely used for Seg
3 min read
Compute a Cosine Dissimilarity Matrix in R Cosine dissimilarity is a measure used in various fields such as text mining, clustering, and data analysis to quantify the difference between two non-zero vectors. It is derived from the cosine similarity, which calculates the cosine of the angle between two vectors in a multi-dimensional space. Wh
3 min read
Confusion Matrix In R In machine learning and statistical classification, the confusion matrix serves as a fundamental tool for evaluating the performance of a predictive model. It provides a concise summary of the classification results produced by a model, revealing the number of true positives, true negatives, false p
4 min read
Matrix in R - Arithmetic Operations Arithmetic operations include addition (+), subtraction (-), multiplication(*), division (/) and modulus(%). In this article we are going to see the matrix creation and arithmetic operations on the matrices in R programming language. ApproachCreate first matrix Syntax: matrix_name <- matrix(data
3 min read
Cross-Validation in R programming The major challenge in designing a machine learning model is to make it work accurately on the unseen data. To know whether the designed model is working fine or not, we have to test it against those data points which were not present during the training of the model. These data points will serve th
9 min read
Mathematical Computations Using R R Programming Language is widely used for statistical computing and data analysis. One of its core strengths lies in its ability to perform various mathematical computations efficiently. Here, we'll explore different methods and functions available in R for mathematical operations. 1. Basic Arithmet
3 min read
How to Measure Execution Time of Function in R ? In this article, we will learn how to measure the execution or running time of a function in the R programming language. Method 1: Using Sys.time For this first create a sample function that runs for a specific duration. For doing so pass the duration to Sys.sleep() function. Syntax:   startTime
4 min read
What is a Metric in Analytics ? Metrics are foundational elements in the world of data analytics and business intelligence. A metric refers to a quantifiable measure that is used to track, monitor, and assess the performance of individuals, teams, systems, and organizations toward desired results. Choosing the right metrics effect
5 min read