0% found this document useful (0 votes)

7 views17 pages

computer project

The project presents a data analysis report on an e-commerce customer dataset, focusing on purchasing behavior and business insights. It includes system study, data cleaning, exploratory analysis, and highlights weak correlations among variables, with some positive trends for age and membership years on spending. The findings aim to inform marketing, customer relationship management, and sales strategies.

Uploaded by

sarah asif

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views17 pages

computer project

Uploaded by

sarah asif

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

Introduction To Computer Application

Fall 2025

Term Project

Data Analysis Report

Submitted to:
Dr. Syed Irfan Nabi

Submitted by:

Syed Raaid Rizvi (32324)

Murtaqa Abbas (32993)
Sarah Asif (33076)

1
Table of contents
Chapter 1: System Study and Domain Analysis……………………………………………..3
1.1-Business Process and Analytics Applications……………………………………………………………….4
1.2-Data Composition……………………………………………………………………………………………………….4

Chapter 2: Data Cleaning………………………………………………………………………………5

2.1- Data Type Analysis and Conversion……………………………………………………………………………5
2.2- Handling Missing Values……………………………………………………………………………………………5

Chapter 3: Exploratory Analysis …………………………………………………………………..6

3.1- Univariate Analysis…………………………………………………………………………………………………..6
3.1.1- Numeric Variables……………………………………………………………………………………………...6
3.1.2- Categorical Variables………………………………………………………………………………………….10
3.2- Bivariate Analysis…………………………………………………………………………………………………….11
3.2.1- Heatmap of numerical data……………………………………………………………………………….11
3.2.2- Regression plots of numerical data……………………………………………………………………….12

Chapter 4: Summary…………………………………………………………………………………………………17

2
Chapter 1: System Study and Domain Analysis
This chapter sets the groundwork for our analysis by examining the domain of online customer
order management, the business context in which our dataset exists. The dataset represents a
small but meaningful slice of an e-commerce operation, capturing key details such as order IDs,
order dates, customer names, product types, quantities, prices, and shipping dates. We will
explore the types of analytics that can be performed on this dataset, such as identifying best-
selling products, analyzing customer purchasing patterns, examining revenue trends, or
detecting delays in order fulfillment. Finally, this chapter provides a detailed overview of the
dataset’s structure, highlighting the main attributes, their data types, and their relevance to
business analysis.

1.1- Business Process:

This dataset appears to represent customer behavior and purchase patterns for a retail or e-
commerce business. The main business process involved is customer relationship management
(CRM) and sales optimization, which include customer profiling, purchase behavior tracking,
customer segregation, marketing and promotion, and sales forecasting and inventory planning.

Types Of Analytics:
The types of analytics that can be performed on this dataset includes:
1. Descriptive Analytics: It helps in summarizing the customer demographics, spending
habits, and purchasing patterns, in the form of a graph or diagram, to help the business
understand its current customer base. It provides clear insights into who the customers
are, what they buy, and how they behave.
2. Segmentation Analysis: It groups customers into distinct clusters based on their
spending levels, purchase frequency, and product category preferences. This helps the
business target each segment with more personalized marketing, offers, and services.
3. Predictive Analysis: Predictive analytics uses past customer behavior, such as spending
patterns, purchase frequency, and loyalty, to forecast who is likely to make high-value
purchases or stop buying.
4. Diagnostic Analysis: This type of analysis investigates the underlying reasons behind
customer behavior by examining patterns and anomalies in the data. It helps the
business understand why certain customers spend less, reduce their purchase
frequency, or shift their category preferences.

3
Analytics Applications and How It Is Useful for a Business:
Analyzing this dataset can give important insights for several key business functions:
1. Sales and Marketing Optimization: By understanding customer spending scores,
preferred product categories, and purchase frequency, businesses can design targeted
marketing campaigns and promote products that match customer interests.
2. Customer Segmentation: The dataset helps identify distinct customer groups based on
income, spending behavior, loyalty, and demographics, allowing the business to tailor its
products, marketing and customer service to specific groups which leads to increased
customer satisfaction, better resource allocation and leading to a higher return on
investment.
3. Customer Relationship Management (CRM): Insights from membership years, last
purchase amount, and spending patterns help identify loyal, high-value customers
allowing businesses to offer personalized and efficient support leading to improved
customer relations which boosts sales and aids in better decision-making.
4. Product and Category Insights: Analyzing the preferred product categories across
different demographic groups helps the business understand which categories are most
profitable and where to focus future promotions.
1.2- Data Composition
Attribute Type Missing values Feature Importance
Id Integer (Numeric) 0 High
Age Float (Numeric) 6 Average
Gender Object (String) 6 Low
Income Float (Numeric) 6 High
Spending score Float (Numeric) 11 High
Member ship years Float (Numeric) 5 Average
Purchase frequency Float (Numeric) 7 Average
Preferred category Object (Categorical) 9 Low
Last purchase Float (Numeric) 7 High
amount

This table helps us understand the importance of each attribute to the business. Income,
spending score and last purchase amount is the most important as it directly influences
revenue, customer value, and business strategy. Whereas, age, membership Years and purchase
frequency are useful for segmentation, loyalty analysis, and sales forecasting but it is not
directly affecting crucial business decisions. Gender and preferred category are mainly used for
personalization and targeted marketing, however it’s not reliable to depend on these for
accurately predicting a customer’s behavior.

Chapter 2: Data Cleaning

4
Data cleaning is an essential first step in preparing any dataset for analysis. Before meaningful
insights can be extracted, the data must be checked for incorrect data types, missing values, and
inconsistencies that can interfere with statistical procedures. In this section, we carefully inspect
the dataset, correct data types where necessary, and apply appropriate strategies to handle
missing values. This ensures that the dataset is accurate, consistent, and ready for reliable
analysis in later stages.

2.1- Data Type Analysis and Conversion

The dataset included columns for ID, age, gender, income, spending score, membership years,
purchase frequency, preferred category, last purchase amount. The originally assigned data
types were int64, float64, object, float64, float64, float64, float64, object, float64 respectively.
Data types assigned to ID, gender, preferred category, last purchase amount are appropriate.
Change is required for the others as age, income, membership years should be whole numbers
so they should be changed to int64. All spending score observed in data is in the form of whole
numbers. On that basis we can assume that the score is given on a scale which does not
concern all rational numbers. Hence, spending score data type is also changed to int64 from
float64.

2.2- Handling Missing Values

Our strategy for handling missing and redundant data was based on discrepancies observed in
the dataset, and a general fix while ensuring that the data can be used for analyzing customer
trends.
1. ‘Age’ and ‘INCOME’ can be used to identify group trends in customers. We ensured both
are within an obvious acceptable range to remove data entry faults like negative ages or
income. All invalid and null values were filled with mean values as they would least
tamper with inferential analysis of the data.
2. For numeric data like 'spending score' and ‘purchase frequency’ that are required to
understand spending trends we used median values due to outliers which would
depreciate the accuracy of the mean.
3. For null values in demographic data like ‘gender’ we filled all empty elements with
‘Other’ to avoid creating incorrect trends in Male and Female categories. For ‘preferred
category’ we ensured no discrepancy in spellings by checking against value_counts()
function and manually replacing all errors. We also replaced null values with ‘Unknown’,
again to avoid inferential errors.
4. For customer relevant information like ‘membership years’ and ‘last purchase amount’
we applied a minimum limit of 1 to ensure all values entered were positive. All invalid
and null values were replaced with mean which was rounded to 2 decimal places for last
purchase amount and converted to integer for membership years to maintain
consistency.

5
Chapter 3: Exploratory Data Analysis
After the data cleaning, we now move forward to the next part of the section: Exploratory Data
Analysis, an important step in understanding the understanding the underlying structure and
characteristics of a dataset. It involves summarizing the main features using both numerical
measures and visualizations, allowing us to identify patterns, trends, and potential outliers.

3.1- Univariate Analysis:

This part focuses on univariate analysis; which helps us identify the shape of the data, whether
it is normal, skewed, or contains outliers. It highlights important characteristics such as mean,
median, mode, spread, and overall patterns. By analyzing one variable at a time using boxplots,
histograms, pie charts and count plots we will be able to better understand the behavior of each
variable and reach to more effective analysis of the data.

3.1.1- Numeric Values:

a) Income:
Income distribution is not normal, it is right-skewed because some customers have
extremely high incomes. the histogram shows that most customers have an income
between 70000 to 82000. The box-plot further shows that the median lies around
82000. There are no outliers in this data set as well.

6
b) Spending score:
Does not follow a normal distribution and is more evenly distributed, however it can be
observed it is slightly skewed left with the mean being around 50.68.

c) Membership Years:
The histogram and box-plot shows that the distribution is not normal and is slightly right-
skewed because many users have higher membership duration. The mean is 5.46, with the
range lies from 3 to 8.

7
d) Purchase Frequency:
There is a moderate spread but most customers fall between 15-40 purchases. The
distribution is not normal as there is mild left-skew showing that customer returns
slightly more frequently.

e) Last Purchase Amount:

Purchases go up to almost 1000 dollars with median lying slightly short of 500.
Interquartile range is from 190 to almost 800 and graph is very slightly right skewed.
The histogram shows that most customers have last purchased within the 0-100
region showing that the store has lots of customers coming in for daily groceries and
such items.

8
f) Age:
It is fairly spread across, however there is slight right-skewness as the upper age values (60-69)
stretch the tail. The interquartile range is from 30-55 and the median is around 42.

9
3.1.2- Categorical values:
a) Gender:
The data is relatively evenly distributed among the three groups as demonstrated by the bar
graph and pie chart, with slightly more individuals falling into the "Other" category .

b) Preferred Category
The bar-graph and pie chart shows us that sports is the most demanded category; however, it is
evenly distributed, with clothing and home & garden being the exact same.

10
3.2- Bivariate Analysis:
This part of the project explores the relationship between two variables at a time, allowing us to
move beyond simple descriptions and start understanding how different aspects of a dataset
interact. It helps reveal whether changes in one variable are associated with changes in another,
whether that relationship is positive, negative, strong, weak, or nonexistent. By using tools such
as heatmaps, regression plots and count plots we will be able to uncover trends, patterns, and
group differences that are not visible through univariate analysis alone.

3.2.1 Heatmap of numerical data:

Every intersection or box seen shows the correlation between two variables. Both axis are
identical, hence we can observe a diagonal dark red line. The dark red color shows a perfect
correlation which is occurring due to the variables interacting with themselves for example age
against age. The scale on the side shows that ranging from dark to red to dark blue the
correlations go from perfectly positive at red to moderately positive at white, and showing no
correlation at light blue, before continuing to negative correlation for dark blue.

11
In our map almost all the boxes are light blue showing most relationships are very weak.
Linearly speaking these values are observably independent of each other. So, we can say
knowing one’s income cannot help us predict their spending score.
he highest positive relationship is at 0.19 between id and membership years. This information is
quite arbitrary and cannot be used for inferential predictions or analysis. No meaningful insight
can be derived from this relation.
Some relationships like age against last purchase amount and membership years vs spending
score are slightly positive at 0.16 and 0.15 respectively. This shows that, though the link is very
weak, there is a slight possibility that customers of greater age spend more, as well as
customers that have been members for a longer period of time.
The most prominent negative correlation stands at -0.14 comparing income and purchase
frequency. This actually shows a slight tendency that with an increased income customers will
purchase less frequently from the stores. The dataset has minimal predictive power as there are
no strong linear correlations.

12
3.2.2 Count plots of non-numerical groups against
spending score
These plots show the distribution of spending score in different groups. Each bar shows a
distinct spending score value and the height of the bar show how many customers share that
spending score. Light colors show low spending scores while darker colors show higher.

1. Gender:

Female: There is a notable spike in the mid-range scores. This suggests females are likely to
spend in or near that middle range.
Male: There are several spikes in the higher score, indicating male customers are likely to spend
within mid high to high spending range.
Other: This category also shows high variability, showing no real trend. This category also
includes corrected data so accurate inference is not likely.

13
2. Preferred category:

Electronics: The most distinct feature is in the Electronics category. There is a very tall, thin bar
in the lighter color range. This shows that a lot of customers who prefer electronics share the
same, relatively low spending score.
Groceries & Clothing: These categories show a messy distribution with many short bars. This
means customers who buy these items have widely varying spending scores and there is very
low likeliness among them.
Sports: This category has a few notable bars in the low to mid-range, suggesting slight
consistency in the spending habits of such customers.
The count plot for these shows the high variability of scores reinforcing the conclusions from
the heatmap of very weak correlations between all variables.

14
3.2.3 Regression plots of numerical data against spending
score

Over all the plots depict nearly the same thing. The plots for income, purchase frequency, and
last purchase amount are homogenous. They all show a nearly horizontal blue line indicating
that they have no impact or driving force on spending score. Dots are all scattered randomly
showing no clear pattern.

15
The only signs of a pattern emerging are apparent in age and membership years. Membership
years has the most noticeable upward sloping line which shows that long term members are
likely to spend more. The same applies for age where there is also a slight upward slope
indicating that older customers are slightly more likely to spend more.

16
Chapter 4: Summary
The project analyzes an e-commerce customer dataset to understand purchasing behavior and
inform business decisions. Data cleaning ensured accuracy, corrected data types, and handled
missing values. Univariate analysis revealed distributions and trends in numeric variables like
income, spending score, and membership years, while categorical variables showed preferences
and demographics. Bivariate analysis explored relationships between variables using heatmaps,
count plots, and regression plots. Overall, most variables showed weak correlations, with slight
positive trends for age and membership years on spending, indicating that older and long-term
customers tend to spend more. Insights can guide marketing, CRM, and sales strategies.

Retail Business Data Analysis Report
No ratings yet
Retail Business Data Analysis Report
19 pages
Credit Card Default Prediction Analysis
No ratings yet
Credit Card Default Prediction Analysis
19 pages
Data Analysis Report: Retail & Diabetes
No ratings yet
Data Analysis Report: Retail & Diabetes
18 pages
Customer Behavior Analysis in Python
No ratings yet
Customer Behavior Analysis in Python
4 pages
Customer Segmentation in Retail Analytics
No ratings yet
Customer Segmentation in Retail Analytics
17 pages
Data Cleaning in Retail Return Analysis
No ratings yet
Data Cleaning in Retail Return Analysis
24 pages
Predicting E-Commerce Customer Purchases
No ratings yet
Predicting E-Commerce Customer Purchases
21 pages
Project1 ECommerce Sales Analysis
No ratings yet
Project1 ECommerce Sales Analysis
11 pages
Retail Transactions Data Analysis Guide
No ratings yet
Retail Transactions Data Analysis Guide
16 pages
Sales Data of Flipkart by Month and Year
No ratings yet
Sales Data of Flipkart by Month and Year
8 pages
Supermarket Sales Data Analysis
No ratings yet
Supermarket Sales Data Analysis
40 pages
Supermarket Customer Shopping Analysis
No ratings yet
Supermarket Customer Shopping Analysis
10 pages
Understanding Customer Behavior
No ratings yet
Understanding Customer Behavior
2 pages
Customer Analytics in Retail Project
No ratings yet
Customer Analytics in Retail Project
8 pages
Customer Purchase Behavior Analysis
No ratings yet
Customer Purchase Behavior Analysis
5 pages
Sales EDA Fresh Complete Report (1)
No ratings yet
Sales EDA Fresh Complete Report (1)
21 pages
Extended Data Analysis Report
No ratings yet
Extended Data Analysis Report
8 pages
E-Commerce Customer Segmentation Guide
No ratings yet
E-Commerce Customer Segmentation Guide
4 pages
Retail Transactions Dataset Insights
No ratings yet
Retail Transactions Dataset Insights
25 pages
Retail Data Analytics Insights and Methods
No ratings yet
Retail Data Analytics Insights and Methods
6 pages
(Nikhil) Informatics Practices Group Project (2025-2026)
No ratings yet
(Nikhil) Informatics Practices Group Project (2025-2026)
49 pages
Blinkit Real-Time Data Analysis Report
100% (1)
Blinkit Real-Time Data Analysis Report
16 pages
Customer Churn Prediction Project
No ratings yet
Customer Churn Prediction Project
8 pages
Superstore EDA: Insights & Data Quality
No ratings yet
Superstore EDA: Insights & Data Quality
15 pages
Amazon Sales Data Analysis Project
No ratings yet
Amazon Sales Data Analysis Project
4 pages
Supermarket Sales Forecasting Project
No ratings yet
Supermarket Sales Forecasting Project
17 pages
EuroElectro Data Analysis Overview
No ratings yet
EuroElectro Data Analysis Overview
21 pages
Visa Transaction Analysis Insights
No ratings yet
Visa Transaction Analysis Insights
15 pages
SQL Project: Customer Data Analysis
No ratings yet
SQL Project: Customer Data Analysis
6 pages
Retail Consumer Behavior Analysis
No ratings yet
Retail Consumer Behavior Analysis
3 pages
Online Shopping Data Analysis Insights
No ratings yet
Online Shopping Data Analysis Insights
23 pages
Business Analytics Course Overview
No ratings yet
Business Analytics Course Overview
11 pages
DataAnalytics Unit1
No ratings yet
DataAnalytics Unit1
26 pages
Data Analysis Projects for Business Insights
No ratings yet
Data Analysis Projects for Business Insights
4 pages
Retail Sales Data Analysis Report
No ratings yet
Retail Sales Data Analysis Report
39 pages
Data Minds
No ratings yet
Data Minds
9 pages
Power BI Data Analysis Project Report
No ratings yet
Power BI Data Analysis Project Report
23 pages
Black Friday Sales Prediction Analysis
No ratings yet
Black Friday Sales Prediction Analysis
33 pages
E-Commerce Customer Segmentation Analysis
No ratings yet
E-Commerce Customer Segmentation Analysis
8 pages
Diwali Sales Data Analysis in Python
No ratings yet
Diwali Sales Data Analysis in Python
17 pages
SMDM Project Report Overview
100% (2)
SMDM Project Report Overview
43 pages
Data Visualization for Energy Optimization
No ratings yet
Data Visualization for Energy Optimization
29 pages
Analyzing Chip Purchasing Behavior
No ratings yet
Analyzing Chip Purchasing Behavior
3 pages
SQL Practice Final Exam Guide
No ratings yet
SQL Practice Final Exam Guide
4 pages
Filed Project Report (TY)
No ratings yet
Filed Project Report (TY)
45 pages
Marketing Analytics and Customer Retention
No ratings yet
Marketing Analytics and Customer Retention
23 pages
Data Analytics Tools and Techniques
No ratings yet
Data Analytics Tools and Techniques
23 pages
Introduction to Customer Analytics
No ratings yet
Introduction to Customer Analytics
6 pages
Ducay
No ratings yet
Ducay
6 pages
Customer Purchasing Behavior Analysis
No ratings yet
Customer Purchasing Behavior Analysis
2 pages
Supermarket Sales Data EDA Insights
No ratings yet
Supermarket Sales Data EDA Insights
3 pages
Introduction to Data Analytics Overview
No ratings yet
Introduction to Data Analytics Overview
25 pages
Data Quality Solutions for Retail Datasets
No ratings yet
Data Quality Solutions for Retail Datasets
3 pages
CDS Data Source and Analysis Overview
No ratings yet
CDS Data Source and Analysis Overview
6 pages
Customer Segmentation with RFM & K-Means
No ratings yet
Customer Segmentation with RFM & K-Means
13 pages
Data Analysis for Vrinda Store Insights
No ratings yet
Data Analysis for Vrinda Store Insights
18 pages
Marketing Strategies Data Analysis
No ratings yet
Marketing Strategies Data Analysis
9 pages
BCA 5th Semester Syllabus Overview
No ratings yet
BCA 5th Semester Syllabus Overview
13 pages
Intentional Binding and Sense of Agency
No ratings yet
Intentional Binding and Sense of Agency
15 pages
Macroeconomics Testbank: 5th Edition
No ratings yet
Macroeconomics Testbank: 5th Edition
19 pages
Simulation Modeling: Validation & Verification
No ratings yet
Simulation Modeling: Validation & Verification
17 pages
Design Recommendations for Deep Beams
No ratings yet
Design Recommendations for Deep Beams
9 pages
Within-Match Betting Analysis in Tennis
No ratings yet
Within-Match Betting Analysis in Tennis
31 pages
Probability and Statistics Exam Paper
No ratings yet
Probability and Statistics Exam Paper
7 pages
BCA Curriculum Overview, Utkal University
No ratings yet
BCA Curriculum Overview, Utkal University
25 pages
Understanding Business Analytics Types
No ratings yet
Understanding Business Analytics Types
19 pages
Business Research Methods Test Bank
No ratings yet
Business Research Methods Test Bank
17 pages
Validity of Smartphone Gait Analysis
No ratings yet
Validity of Smartphone Gait Analysis
9 pages
Empty Nest Syndrome and Marital Satisfaction
No ratings yet
Empty Nest Syndrome and Marital Satisfaction
5 pages
Marriage's Impact on Men's Productivity
No ratings yet
Marriage's Impact on Men's Productivity
27 pages
Environmental and Carbon Performance Impact on Earnings Management
No ratings yet
Environmental and Carbon Performance Impact on Earnings Management
12 pages
Equipment Operation Time and Queueing Analysis
No ratings yet
Equipment Operation Time and Queueing Analysis
13 pages
B.Sc. Statistics Syllabus 2021-2022
No ratings yet
B.Sc. Statistics Syllabus 2021-2022
176 pages
Mathematics National Exam 2022 Paper
No ratings yet
Mathematics National Exam 2022 Paper
4 pages
QC Tools for Rubber Product Defects
No ratings yet
QC Tools for Rubber Product Defects
53 pages
Environmental Awareness in Surigao Schools
No ratings yet
Environmental Awareness in Surigao Schools
10 pages
Midterm Exam: Bio Statistics 2025/2026
No ratings yet
Midterm Exam: Bio Statistics 2025/2026
10 pages
Discriminant Validity of CBR Scale
No ratings yet
Discriminant Validity of CBR Scale
14 pages
WGU C784 Healthcare Statistics Exam Guide
100% (1)
WGU C784 Healthcare Statistics Exam Guide
7 pages
Training Program for Teacher Readiness
No ratings yet
Training Program for Teacher Readiness
15 pages
ML Techniques for Concrete Strength Prediction
No ratings yet
ML Techniques for Concrete Strength Prediction
26 pages
Academic Performance in Learning Modes
No ratings yet
Academic Performance in Learning Modes
9 pages
Neutral Responses in Likert Scales Analysis
No ratings yet
Neutral Responses in Likert Scales Analysis
30 pages
Machine Learning for Building Cost Optimization
No ratings yet
Machine Learning for Building Cost Optimization
14 pages
Understanding Pearson Correlation Coefficient
No ratings yet
Understanding Pearson Correlation Coefficient
12 pages
Independent vs. Dependent Variables Worksheet
No ratings yet
Independent vs. Dependent Variables Worksheet
32 pages
IoT's Impact on Future Albanian Businesses
No ratings yet
IoT's Impact on Future Albanian Businesses
11 pages

computer project

Uploaded by

computer project

Uploaded by

Introduction To Computer Application

Data Analysis Report

Syed Raaid Rizvi (32324)

Chapter 2: Data Cleaning………………………………………………………………………………5

Chapter 3: Exploratory Analysis …………………………………………………………………..6

1.1- Business Process:

Chapter 2: Data Cleaning

2.1- Data Type Analysis and Conversion

2.2- Handling Missing Values

3.1- Univariate Analysis:

3.1.1- Numeric Values:

e) Last Purchase Amount:

3.2.1 Heatmap of numerical data:

You might also like