What is Box plot and the condition of outliers?
Last Updated :
14 Jul, 2025
A Box Plot is a data visualization that summarizes a dataset’s distribution. It shows key features like the range, median and spread of the data which makes it easier to understand the overall pattern. It’s an efficient way to spot patterns and identify outliers i.e data points that stand out from the rest due to their extreme values. In this article, we’ll see how box plots work, how to identify outliers and other core concepts.
Key Components of a Box Plot
A box plot visually displays five important numbers that summarize the distribution of a dataset:
Key Components of a Box Plot- Minimum: Smallest value in the dataset excluding outliers.
- First Quartile (Q1): Middle value of the lower half of the data (25th percentile).
- Median (Q2): Middle value that splits the dataset into two equal parts (50th percentile).
- Third Quartile (Q3): Middle value of the upper half of the data (75th percentile).
- Maximum: Largest value in the dataset excluding outliers.
Understanding the Box Plot
A Box Plot consists of three main parts:
Understanding the Box Plot- Box: The box spans from the first quartile (Q1) to the third quartile (Q3) representing the interquartile range (IQR). This box contains the middle 50% of the data with a line inside it representing the median (Q2).
- Whiskers: The lines extending from the box called whiskers shows the range of the data. They extend from Q1 to the minimum value and from Q3 to the maximum value excluding outliers.
- Outliers: Data points that fall outside the whiskers are considered outliers. These are marked separately and represent values that are significantly higher or lower than the rest of the data.
How to Calculate Quartiles and Identify Outliers?
To calculate quartiles and identify outliers follow these steps:
1. Order the Data: Arrange the data from smallest to largest.
2. Find the Median (Q2): The median divides the dataset into two halves:
- If the dataset has an odd number of data points, exclude the median from both halves.
- If the dataset has an even number of data points, the median is the average of the two middle values.
3. Calculate the Quartiles:
- First Quartile (Q1): Median of the lower half of the data.
- Third Quartile (Q3): Median of the upper half of the data.
4. Identify Outliers:
- Interquartile Range (IQR) is calculated as: IQR = Q3 - Q1
- An outlier is any data point that lies outside the bounds defined by:
Lower Bound: Q1 - 1.5 * IQR
Upper Bound: Q3 + 1.5 * IQR
Any data point beyond these bounds is considered an outlier.
Creating a Box Plot in Python
Lets consider a well distributed dataset [0, 1, 2, 3, 4, 5, 6] to understand how a box plot is created. Here we will be using Pandas and Matplotlib libraries.
Python
import pandas as pd
import matplotlib.pyplot as plt
data = [0, 1, 2, 3, 4, 5, 6]
df = pd.DataFrame(data, columns=['Num'])
df
Output:

Now plotting the dataframe using box plot:
Python
plt.figure(figsize = (10, 7))
df.boxplot()
Output:

- Minimum: 0
- First Quartile (Q1): 1.5
- Median (Q2): 3
- Third Quartile (Q3): 4.5
- Maximum: 6
This dataset is evenly distributed with no outliers. Let us see different cases of box plots with outliers present.
Dataset with an Outlier
Now, let’s consider a dataset [0, 1, 2, 3, 4, 5, 10].
Python
import pandas as pd
import matplotlib.pyplot as plt
data= [0, 1, 2, 3, 4, 5, 10]
df= pd.DataFrame(data, columns=['Num'])
plt.figure(figsize=(10, 7))
df.boxplot()
plt.show()
Output:
Here the max value is 5 because the third quartile is 4.5 and the interquartile range is (4.5-1.5)=3
. So, 1.5*3
is 4.5
and third quartile(4.5)+4.5=9
. So 10 is larger than the limit 9 thus it becomes an outlier.
Why Are Outliers Important?
Outliers can have a major impact on our analysis because:
- Skewing Statistics: They can significantly change the mean and other statistical measures.
- Model Performance: Some machine learning models may struggle with outliers, affecting predictions.
- Highlighting Key Insights: In some cases, outliers represent important events or anomalies that are important to identify.
How to Handle Outliers?
- Investigate the Cause: Check if the outliers are due to data errors or if they represent meaningful variations.
- Use Statistical Methods: Techniques like the Z-score or IQR filtering can help us to decide if outliers should be removed.
- Use Robust Models: Some machine learning algorithms such as Random Forests are more resistant to outliers.
Applications of Box Plots
Box plots are useful for various purposes other than detecting outliers:
- Comparing Data: They allow for easy comparison of distributions across multiple datasets or groups.
- Detecting Skewness: The position of the median within the box can shows whether the data is skewed.
- Understanding Data Spread: The length of the box and whiskers shows the spread of the data which helps to visualize variability.
With a clear understanding of box plots and outlier detection, we can now apply these insights to our data analysis and gain valuable perspectives and ensures more accurate results.
Similar Reads
Interview Preparation
Practice @Geeksforgeeks
Data Structures
Algorithms
Programming Languages
Web Technologies
Computer Science Subjects
Data Science & ML
Tutorial Library
GATE CS