Interpretations of Histogram

Last Updated : 05 Aug, 2024

Histograms helps visualizing and comprehending the data distribution. The article aims to provide comprehensive overview of histogram and its interpretation.

What is Histogram?

Histograms are graphical representations of data distributions. They consist of bars, each representing the frequency or count of observations falling within specific intervals, known as bins. We can also say a histogram is a variation of a bar chart in which data values are grouped together and put into different classes. This grouping enables you to see how frequently data in each class occur in the dataset.

The histogram graphically shows the following:

Frequency of different data points in the dataset.
Location of the center of data.
The spread of dataset.
Skewness/variance of dataset.
Presence of outliers in the dataset.

The features provide a strong indication of the proper distributional model in the data. The probability plot or a goodness-of-fit test can be used to verify the distributional model.

The histogram contains the following axes:

Vertical Axis: Frequency/count of each bin.
Horizontal Axis: List of bins/categories.

How histogram works?

The histogram works by organizing and visualizing the distribution of data into intervals or bins along a continuous scale.

The range of data values is divided into intervals called "bins." The number of bins and their widths can be predefined or determined algorithmically based on the range and distribution of the data.
Each data point in the dataset is assigned to a corresponding bin based on its value. As data points are assigned to bins, the frequency or count of data points falling within each bin is calculated.
The histogram is constructed by plotting the bins along the x-axis and the frequencies (or densities) along the y-axis. Each bin is represented by a bar, and the height of the bar corresponds to the frequency of data points in that bin.

By examining the histogram, you can gain insights into the distribution of the data. You can identify patterns, trends, central tendencies, variability, outliers, and other characteristics of the dataset. For example, a symmetric bell-shaped histogram suggests a normal distribution, while skewed histograms indicate asymmetry in the data.

Suppose you're analyzing the distribution of scores on a standardized test. You have data for 2000 students, and you want to visualize how many students scored within different score ranges. For this you can create a histogram using the following data.

Score Range	Frequency
0-25	150
26-50	300
51-75	600
76-100	750
101-125	150
126-150	50

The histogram show that the data is normally distributed, and the students have mostly score between 76-100. This histogram displays the frequency of students falling within different score ranges on the standardized test. Each bar represents a score range, and the height of the bar represents the frequency of students in that range. By customizing the x-axis intervals and the labels, you can effectively visualize the distribution of test scores. Additionally, you can further customize the histogram by changing the y-axis to display percentages or density if needed.

Histogram and its Interpretation

Normal Histogram

Normal histogram is a classical bell-shaped histogram with most of the frequency counts focused on the middle with diminishing tails and there is symmetry with respect to the median. Since the normal distribution is most commonly observed in real-world scenarios, you are most likely to find these. In Normally distributed histogram mean is almost equal to median.

Note: In the implementation. we will be using NumPy, Matplotlib and Seaborn plotting libraries. These libraries are pre-installed in colab, however for local environment, you can install these easily with pip install command.

Python

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Normal histogram plot
data = np.random.normal(10.0, 3, 500)
sns.displot(data, kde= True, bins=10, color='black')

Output:

download-(1) — Normal Distribution Graph

We have plotted a normal distribution graph.

The peak of the curve represents the mean of the dataset.
The normal distribution graph is symmetric.

Non-normal Short-tailed/ long-tailed histogram

In short-tailed distribution tail approaches 0 very fast, as we move from the median of data, In the long-tailed histogram, the tail approaches 0 slowly as we move far from the median. Here, we refer tail as the extreme regions in the histogram where most of the data is not concentrated and this is on both sides of the peak.

Bimodal Histogram

A mode of data represents the most common values in the histogram (i.e. peak of the histogram. A bimodal histogram represents that there are two peaks in the histogram. The histogram can be used to test the unimodality of data. The bimodality (or for instance non-unimodality) in the dataset represents that there is something wrong with the process. Bimodal histogram many one or both of two characters: Bimodal normal distribution and symmetric distribution.

Python

# Bi-modal histogram
N=400
mu_1, sigma_1 = 80, 10
mu_2, sigma_2 = 20, 10
# Generate two normal distributions of given mean sdand concatenate
X_1 = np.random.normal(mu_1, sigma_1, N)
X_2 = np.random.normal(mu_2, sigma_2, N)
X = np.concatenate([X_1, X_2])
sns.displot(X,bins=10,kde=True , color='green')

Output:

Skewed Left/Right Histogram

Skewed histogram is those where the one-side tail is quite clearly longer than the other-side tail. A right-skewed histogram means that the right-sided tail of the peak is more stretched than its left and vice-versa for the left-sided. In a left-skewed histogram, the mean is always lesser than the median, while in a right-skewed histogram mean is greater than the histogram.

Right-skewed Histogram

Python

# Right-skewed Histogram
rdata = [0] * 19 + [1]*49 + [2]*60 + [3] * \
    47 + [4]*32 + [5] * 18 + [6]*3 + [7]*3 + [8]
sns.displot(rdata, bins=8, kde=True, alpha=0.6, color='blue')

Output:

Left-skewed Histogram

Python

# Left-skewed Histogram
ldata = [0]* 19 + [-1]*49 + [-2]*60 + [-3] *47 + [-4]*32 + [-5]* 18+ [-6]*3 + [-7]*3 + [-8]
sns.displot(ldata, kde = True,bins=8,  alpha=0.6, color='red')

Output:

Uniform Histogram

In uniform histogram, each bin contains approximately the same number of counts (frequency). The example of uniform histogram is such as a die is rolled n (n>>30) number of times and record the frequency of different outcomes.

Python

# Generate random data following a uniform distribution
data = np.random.uniform(low=0, high=1, size=600)
sns.histplot(data, kde =True, bins =10 )
plt.show()

Output:

Normal Distribution with an Outlier

This histogram is similar to normal histogram except it contains an outlier where the count/ probability of outcome is substantive. This is mostly due to some system errors in process, which led to faulty generation of products etc.

Python

# Normal distribution with an outlier
mu, sigma = 80, 10
X_1 = np.random.normal(mu, sigma, N)
X_1 =np.concatenate([X_1, [200]*30])
sns.displot(X_1, kde= True, bins=13)

Output:

download-(3) — Normal Distribution with an Outlier

Box Plot

pawangfg

Improve

Article Tags :

Practice Tags :

Interpretations of Histogram

What is Histogram?

How histogram works?

Histogram and its Interpretation

Normal Histogram

Non-normal Short-tailed/ long-tailed histogram

Bimodal Histogram

Skewed Left/Right Histogram

Right-skewed Histogram

Left-skewed Histogram

Uniform Histogram

Normal Distribution with an Outlier

Similar Reads

Introduction to Data Analysis

Data Analysis Libraries

Data Visulization Libraries

Exploratory Data Analysis (EDA)

Data Preprocessing

Data Transformation

Time Series Data Analysis

Case Studies and Projects

Thank You!

What kind of Experience do you want to share?