Univariate, Bivariate and Multivariate data and its analysis

Last Updated : 11 Nov, 2025

Data analysis is an important process in understanding hidden patterns, relationships and trends within raw information. It forms the foundation of decision-making in almost every domain, from business forecasting to scientific research. Depending on the number of variables under consideration, data analysis can be categorized into three main types: Univariate, Bivariate and Multivariate.

1. Univariate Data

Univariate data involves observations consisting of only one variable. The goal is to describe and summarize this single variable’s properties such as its average, spread and shape of distribution. Since there is no relationship or dependency to explore, it is the simplest and most straightforward form of statistical analysis.

Measures central tendency (mean, median, mode) to find the typical value.
Measures dispersion (range, variance, standard deviation) to see how data spreads.
Detects patterns like skewness or outliers that affect data interpretation.
Common visuals include histograms, box plots and density plots to show frequency and spread.

Example: Heights (in cm) of seven students in a class:

[164, 167.3, 170, 174.2, 178, 180, 186]

Here the only variable is height and no relationship or interaction with other variables is being considered.

Implementation:

Python

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

heights = np.array([164, 167.3, 170, 174.2, 178, 180, 186])
df = pd.DataFrame({'Height (cm)': heights})
print(df.describe())

sns.histplot(df['Height (cm)'], bins=5, kde=True, color='skyblue')
plt.title("Univariate Analysis of Height")
plt.show()

Output:

Applications:

Quality Control in Manufacturing: Used to analyze one feature at a time, like checking the diameter of bolts or weight of packages to ensure uniformity.
Healthcare Statistics: Used to study one health metric at a time, such as patient blood pressure or cholesterol levels, to identify trends.

Advantages:

Simple and fast to compute.
Provides clear summaries and visual representation.

Limitations:

Doesn’t reveal cause-effect or relationships.

2. Bivariate data

Bivariate data refers to a dataset where each observation is associated with two different variables. The goal of analyzing bivariate data is to understand the relationship or association between these two variables. It helps to identify how one variable might affect or be related to the other.

Detects whether the relationship is positive, negative or non-existent.
Correlation measures the strength and direction of the relationship (range: -1 to +1).
Visualization tools like scatter plots and regression lines show the pattern of change clearly.
Often used as a foundation before moving to multivariate models.

Example: Consider the relationship between temperature and ice cream sales during the summer season:

Temperature	Ice Cream Sales
20	2000
25	2500
30	4000
35	5000

In this case, the two variables are temperature and ice cream sales. The data suggests a positive relationship where sales increase as the temperature rises. This shows that as one variable like temperature changes then other variable like ice cream sales also changes in a predictable way.

Implementation:

Python

data = {'Temperature (°C)': [20, 25, 30, 35],
        'Ice Cream Sales': [2000, 2500, 4000, 5000]}
df2 = pd.DataFrame(data)
corr = df2.corr(numeric_only=True)
print("Correlation Matrix:\n", corr)

sns.scatterplot(data=df2, x='Temperature (°C)',
                y='Ice Cream Sales', color='orange', s=80)
sns.regplot(data=df2, x='Temperature (°C)',
            y='Ice Cream Sales', scatter=False, color='blue')
plt.title("Bivariate Analysis: Temperature vs Ice Cream Sales")
plt.xlabel("Temperature (°C)")
plt.ylabel("Ice Cream Sales")
plt.show()

Output:

Applications:

Sales and Advertising Relationship: Helps examine how changes in advertising budget affect sales numbers.
Education Performance Analysis: Studies how students’ study hours relate to their exam scores.

Advantages:

Reveals relationships and dependencies.
Useful for preliminary hypothesis testing.

Limitations:

Limited to only two variables at a time.
Cannot explain multi-factor influences.

3. Multivariate data

Multivariate data contains three or more variables for each observation. The objective is to uncover how multiple variables interact or jointly affect outcomes. It’s crucial in fields like predictive analytics, econometrics and data science, where relationships are seldom limited to two variables.

Useful when an outcome depends on several influencing factors.
Techniques include multiple regression, PCA, MANOVA and clustering.
Reduces data complexity using dimensionality reduction (like PCA).
Commonly visualized using heatmaps, pair plots and 3D scatter plots for high-dimensional insight.

Example: Consider a scenario where an advertiser wants to analyze the click rates for different advertisements on a website. The data includes multiple variables such as advertisement type, gender and click rate.

Advertisement	Gender	Click rate
Ad1	Male	80
Ad3	Female	55
Ad2	Female	123
Ad1	Male	66
Ad3	Male	35

Here there are three variables: advertisement type, gender and click rate. Multivariate analysis allows us to see how these variables interact and how one variable might affect another in the context of the others.

Implementation:

Python

import statsmodels.api as sm

data = {
    'Ad_Type': [1, 2, 3, 1, 3],
    'Gender': [0, 1, 1, 0, 0],
    'Click_Rate': [80, 123, 55, 66, 35]
}
df3 = pd.DataFrame(data)
X = df3[['Ad_Type', 'Gender']]
y = df3['Click_Rate']
X = sm.add_constant(X)

model = sm.OLS(y, X).fit()
print(model.summary())

Output:

Applications:

Customer Segmentation in Marketing: Uses multiple factors like age, income and buying habits to group customers effectively.
Economic Forecasting: Analyzes variables such as GDP, inflation, employment rate and exports together to predict future trends.

Advantages:

Captures real-world complexity.
Enables predictive modeling and deep insights.

Limitations:

Computationally intensive and requires large datasets.
Interpretation can be complex and may suffer from multicollinearity.

Difference between Univariate, Bivariate and Multivariate data

Lets see a tabular difference between each of them for better understanding.

Feature	Univariate	Bivariate	Multivariate
Variables	One	Two	More than two
Objective	Describe a single variable	Examine relationship between two variables	Understand relationships among multiple variables
Dependency	No dependent variable	One dependent variable	Multiple dependent variables
Techniques	Descriptive statistics, histogram	Correlation, scatter plot, regression	Regression, PCA, MANOVA
Visualization	Histogram, Box Plot	Scatter Plot, Regression Line	Pair Plot, Heatmap, 3D Analysis
Example	Height of students	Temperature vs Ice Cream Sales	Ad Type, Gender & Click Rate
Complexity	Low	Moderate	High

aaradhanathapliyal

Improve

Article Tags :

Univariate, Bivariate and Multivariate data and its analysis

1. Univariate Data

Implementation:

Applications:

Advantages:

Limitations:

2. Bivariate data

Implementation:

Applications:

Advantages:

Limitations:

3. Multivariate data

Implementation:

Applications:

Advantages:

Limitations:

Difference between Univariate, Bivariate and Multivariate data

Explore

Thank You!

What kind of Experience do you want to share?