Open In App

Outlier Analysis in R

Last Updated : 10 Jul, 2025
Comments
Improve
Suggest changes
Like Article
Like
Report

Outliers are data points that differ significantly from the rest of the data. These values are often far removed from the general pattern of the dataset, disrupting its overall distribution. Outlier detection is an important statistical technique used to identify these unusual values, which could result from various factors like measurement errors, incorrect data entry or genuinely rare events.

Impact of Outliers on Models

Outliers can have several detrimental effects on the performance and accuracy of machine learning models:

  • Skewed Data Distribution: Outliers can distort the shape of the data, making it unrepresentative of the underlying trend.
  • Distorted Statistical Metrics: They can alter essential statistics, such as the mean, variance and standard deviation, leading to inaccurate conclusions.
  • Biased Model Accuracy: Outliers can bias the model, reducing its ability to generalize to new data and impacting overall prediction accuracy.

Implementation of Outlier Dectection

We will explore different methods to detect and remove outliers present in a given dataset.

1. Create Data with Outliers

We will create a sample data containing the outlier inside it using the rnorm() function and generating 500 different data points. Further, we will be adding 10 random outliers to this data.

R
data <- rnorm(500)
data[1:10] <- c(46,9,15,-90,
         42,50,-82,74,61,-32) 

2. Visualizing Outliers Using Boxplot

We use the boxplot() function to visualize outliers. Outliers are identified as points outside the "whiskers" of the boxplot.

Syntax:

boxplot(x, data, notch, varwidth, names, main)

Parameters:

  • x: This parameter sets as a vector or a formula.
  • data: This parameter sets the data frame.
  • notch: This parameter is the label for horizontal axis.
  • varwidth: This parameter is a logical value. Set as true to draw width of the box proportionate to the sample size.
  • main: This parameter is the title of the chart.
  • names: This parameter are the group labels that will be showed under each boxplot.
R
data <- rnorm(500)

data[1:10] <- c(46,9,15,-90,
         42,50,-82,74,61,-32) 
         
boxplot(data)

Output:

Outlier Analysis in RGeeksforgeeks
Outlier Detection

3. Removing Outliers

We will remove the outlier using the boxplot.stats() function, which returns outlier values. The !data %in% condition removes these outliers from the data.

R
newdata <- data[!data %in% boxplot.stats(data)$out]

4. Verifying Outlier Removal

We will just verify if the outliner has been removed from the data simply by plotting the boxplot again.

R
boxplot(newdata)

Output:

Outlier Analysis in RGeeksforgeeks
Outlier Detection

As we can see in the output plot that there is no outlier plotted in the plot. so, we successfully analyze and remove the outlier.

5. Visualizing Outliers with a Histogram

Histograms are another way to detect outliers visually. Here, we create a dataset with random outliers and plot a histogram.

R
set.seed(123)
data <- c(rnorm(1000), 10, 15, 12, 100)

hist(data)

Output:

Outlier Analysis in RGeeksforgeeks
Outlier Detection

6. Detecting and Removing Outliers from Multiple Columns

To detect and remove outliers from a data frame, we use the Interquartile range (IQR) method. If an observation is 1.5 times the interquartile range greater than the third quartile or 1.5 times the interquartile range less than the first quartile it is considered an outlier.

We create functions to detect and remove outliers using the IQR method.

  • detect_outlier() function calculates the IQR and identifies values outside the acceptable range.
  • remove_outlier() function iterates through the columns of the data frame and removes the rows that contain outliers based on the IQR method.
R
sample_data <- data.frame(x=c(1, 2, 3, 4, 3, 12, 3, 4, 4, 15, 0),
                           y=c(4, 3, 25, 7, 8, 5, 9, 77, 6, 5, 0),
                           z=c(1, 3, 2, 90, 8, 7, 0, 48, 7, 2, 3))

print("Display original dataframe")
print(sample_data)
boxplot(sample_data)

detect_outlier <- function(x) {

  Quantile1 <- quantile(x, probs=.25)
  Quantile3 <- quantile(x, probs=.75)
  
  IQR = Quantile3-Quantile1
  
  x > Quantile3 + (IQR*1.5) | x < Quantile1 - (IQR*1.5)
}

remove_outlier <- function(dataframe,
                            columns=names(dataframe)) {
  
  for (col in columns) {
    dataframe <- dataframe[!detect_outlier(dataframe[[col]]), ]
  }
  
  return(dataframe)
}
remove_outlier(sample_data, c('x', 'y', 'z'))

Output:

sample_data
Sample Data containing outliers
Outliers Detections in RGeeksforgeeks
Outliers Detection
output_outlier
Output after outlier removal

In this article, we learned how to detect and remove outliers in R using visualizations and statistical methods like the IQR method, ensuring cleaner data for better analysis and model accuracy


Article Tags :

Similar Reads