Open In App

Root-Mean-Square Error in R Programming

Last Updated : 11 Oct, 2024
Comments
Improve
Suggest changes
Like Article
Like
Report

In this article, we will cover the theory behind RMSE, explain its significance, and provide detailed steps on how to compute RMSE in R programming using different approaches using R Programming Language.

What is RMSE?

Root mean squared error (RMSE) is the square root of the mean of the square of all of the errors. RMSE is considered an excellent general-purpose error metric for numerical predictions. RMSE is a good measure of accuracy, but only to compare prediction errors of different models or model configurations for a particular variable and not between variables, as it is scale-dependent. It is the measure of how well a regression line fits the data points. The formula for calculating RMSE is:

[Tex][ \text{RMSE} = \sqrt{\frac{1}{n} \sum_{i=1}^{n} (y_i – \hat{y_i})^2} ] [/Tex]

where

  • [Tex]yi[/Tex]= actual value
  • [Tex]\hat{y_i}[/Tex] = predicted value
  • n = number of observations.
Note: The difference between the actual values and the predicted values is known as residuals.

Significance of RMSE

Here are the main Significance of RMSE:

  • Scale-Dependent: RMSE has the same units as the target variable. A lower RMSE indicates better model performance, but the value must be compared with the scale of the target variable to make sense.
  • Sensitive to Outliers: Since RMSE squares the error terms, larger errors have a disproportionately large effect, making RMSE sensitive to outliers.
  • Comparing Models: RMSE can be used to compare models. A model with a lower RMSE value is generally considered better at predicting the target variable.

Computing RMSE in R

Now we will discuss different method to Computing RMSE in R Programming Language:

1: Simple RMSE Calculation

Let’s first compute the RMSE between two vectors (actual and predicted values) manually.

R
# Sample actual and predicted values
actual = c(1.5, 1.0, 2.0, 7.4, 5.8, 6.6)          
predicted = c(1.0, 1.1, 2.5, 7.3, 6.0, 6.2)    

# Calculate RMSE
rmse <- sqrt(mean((actual - predicted)^2))
rmse

Output:

[1] 0.3464102

The above code calculates the RMSE between the actual and predicted values manually by following the RMSE formula.

2: Calculating RMSE Using the Metrics Package

The Metrics package offers a convenient rmse() function. First, install and load the package:

R
# Install and load the Metrics package
install.packages("Metrics")
library(Metrics)

# Calculate RMSE using the rmse() function
rmse_value <- rmse(actual, predicted)
rmse_value

Output:

[1] 0.3464102

3: Calculating RMSE Using the caret Package

The caret package is a popular package for machine learning and model evaluation. It provides a similar RMSE() function.

R
# Install and load the caret package
install.packages("caret")
library(caret)

# Calculate RMSE using the RMSE() function from caret
rmse_value <- RMSE(predicted, actual)
rmse_value

Output:

[1] 0.3464102

4: Calculate RMSE for Regression Models

In regression models, RMSE is used to evaluate the performance of the model. Let’s fit a simple linear regression model in R and compute the RMSE for the predicted values.

R
# Load the dataset
data(mtcars)

# Fit a linear regression model
model <- lm(mpg ~ hp, data = mtcars)

# Get the predicted values
predicted_values <- predict(model, mtcars)

# Compute RMSE
actual_values <- mtcars$mpg
rmse_regression <- sqrt(mean((actual_values - predicted_values)^2))
rmse_regression

Output:

[1] 3.740297

This example fits a linear regression model predicting the mpg (miles per gallon) of cars based on horsepower (hp) and computes the RMSE to evaluate the model’s prediction accuracy.

Interpreting RMSE involves understanding its relationship with the data:

  • Low RMSE: Indicates that the model’s predictions are close to the actual values.
  • High RMSE: Indicates large errors in prediction.

However, the RMSE value should always be interpreted in the context of the data. For example, an RMSE of 10 might be considered good for a dataset where the target variable ranges between 100 and 500, but it could indicate poor performance if the target variable ranges between 0 and 20.

Visualizing RMSE

Visualizing the performance of your model can help in understanding where the model is underperforming. A scatter plot of actual vs. predicted values can provide insights into how well the model fits the data.

R
# Plot actual vs predicted values
plot(actual_values, predicted_values, xlab = "Actual", ylab = "Predicted",
                                      main = "Actual vs Predicted Values")
abline(0, 1, col = "red") # Add a reference line

Output:

gh

Root-Mean-Square Error in R Programming

The closer the points are to the red line (where actual = predicted), the better the model’s predictions.

Conclusion

Root Mean Square Error (RMSE) is a vital metric for assessing the accuracy of regression models in R. It provides a clear sense of how well the predicted values align with the actual data. In this article, we discussed the concept of RMSE, how to calculate it in R, and how to interpret it. We also looked at its role in model comparison, cross-validation, and visualization. RMSE is easy to calculate and provides intuitive insights into model performance, making it a go-to metric for regression tasks.



Next Article

Similar Reads