MRA-Part-A-
MRA-Part-A-
Overview of
Problem Statement Data Overview
Analysis
Insights and
Recommendations
Executive Summary
• Data: past 3 years.
• Objective: identify the underlying buying patterns of the
customers and recommend customized marketing
strategies for different segments of customers.
• Dataset: 20 columns and 2747 rows,
• Missing values and Duplicate values: None
• Outliers: some columns has few outliers
• The exploratory analysis and insights provide a clear
understanding of the data and highlight the key trends
and patterns in sales.
• RFM analysis has been performed to segment the
customers into four categories based on their buying
behavior, and customized marketing strategies have been
recommended for each segment.
• The presentation concludes with recommendations for
the company to enhance its customer relationships and
drive business growth.
Problem
Statement
An automobile parts manufacturing company has collected data on
transactions for 3 years. They do not have any in-house data science team,
thus they have hired you as their consultant. Your job is to use your data
science skills to find the underlying buying patterns of the customers, provide
the company with suitable insights about their customers, and recommend
customized marketing strategies for different segments of customers.
• Dataset:
• Auto Sales Data: Sales_Data.xlsx
Column Name Description
ORDERNUMBER This column represents the unique identification number assigned to each order.
QUANTITYORDERED It indicates the number of items ordered in each order.
PRICEEACH This column specifies the price of each item in the order.
ORDERLINENUMBER It represents the line number of each item within an order.
SALES This column denotes the total sales amount for each order, which is calculated by multiplying
the quantity ordered by the price of each item.
ORDERDATE It denotes the date on which the order was placed.
DAYS_SINCE_LASTORDER This column represents the number of days that have passed since the last order for each
customer. It can be used to analyze customer purchasing patterns.
Data STATUS It indicates the status of the order, such as "Shipped," "In Process," "Cancelled," "Disputed,"
"On Hold," or "Resolved"
Dictionary PRODUCTLINE
MSRP
This column specifies the product line categories to which each item belongs.
It stands for Manufacturer's Suggested Retail Price and represents the suggested selling price
for each item.
PRODUCTCODE This column represents the unique code assigned to each product.
CUSTOMERNAME It denotes the name of the customer who placed the order.
PHONE This column contains the contact phone number for the customer.
ADDRESSLINE1 It represents the first line of the customer's address.
CITY This column specifies the city where the customer is located.
POSTALCODE It denotes the postal code or ZIP code associated with the customer's address.
COUNTRY This column indicates the country where the customer is located.
CONTACTLASTNAME It represents the last name of the contact person associated with the customer.
CONTACTFIRSTNAME This column denotes the first name of the contact person associated with the customer.
DEALSIZE It indicates the size of the deal or order, which are the categories "Small," "Medium," or
"Large."
Data Overview
Shape:
Dataset: • Total Rows: 2747
• Total Columns: 20
Sales_Data.xlsx
Data Types:
• float64
• datetime64[ns]
• int64
• object
Data Type of each column
QUANTITYORDE ORDERLINENUM
ORDERNUMBER PRICEEACH
Total Data Types: RED BER
int64 float64
int64 int64
DAYS_SINCE_LAS
SALES ORDERDATE STATUS PRODUCTLINE
TORDER
float64 datetime64[ns] object object
int64
CUSTOMERNAM
MSRP PRODUCTCODE PHONE ADDRESSLINE1
E
int64 object object object
object
CONTACTLASTN CONTACTFIRSTN
CITY POSTALCODE COUNTRY
AME AME
object object object
object object
DEALSIZE
.
object
Summary
• Numeric columns: ORDERNUMBER, QUANTITYORDERED, PRICEEACH, ORDERLINENUMBER
SALES, DAYS_SINCE_LASTORDER, MSRP
DAYS_SINCE_LASTO
Index ORDERNUMBER QUANTITYORDERED PRICEEACH ORDERLINENUMBER SALES ORDERDATE RDER MSRP
count 2747 2747 2747 2747 2747 2747 2747 2747
Inference :
• The average number of items ordered per sales order is 35, with a standard deviation of 9.76.
• The average price of each item is 101.09, with a standard deviation of 42.04.
• The average sales amount per order is 3553.05, with a standard deviation of 1838.95.
• The average time since the last order is 1757.09 days, with a standard deviation of 819.28.
• The summary statistics do not indicate any red flags or abnormalities that could potentially
indicate issues with the data.
Exploratory Analysis & Insights
Univariate, Bivariate,
and multivariate
analysis using data
visualization Summarize the
inferences from
• Weekly, Monthly, Quarterly,
Yearly Trends in Sales
the above
• Sales Across different analysis
Categories of different
features in the given data
Product Line
Analysis
Deal Size
Analysis
Multivariate
Analysis
Heat Map
➢ Sales & MSRP (0.63):
Higher MSRP may enhance perceived product
value, positively impacting sales.
• Inference:
• Product Line Analysis: Classic cars and vintage cars are the most popular product lines, indicating a strong demand for these
vehicle models. Conversely, ships, trucks, buses, and trains have relatively lower counts, suggesting a niche market or specialized
customer base for these product lines.
• Deal Size Analysis: The majority of the deals fall into the small and medium categories, with counts of 1,246 and 1,349,
respectively. This indicates that the company primarily engages in transactions of moderate to smaller deal sizes.
• Order Status Analysis: The significant count of 2,541 successfully shipped orders reflects the company's outstanding order
fulfilment capabilities and commitment to delivering orders promptly.
• Quality Ordered Analysis: The bins with quality orders of 21, 28, 35, and 42 have the highest counts, indicating their popularity
among customers. With counts ranging from 576 to 631, these quantities meet the demands of a significant portion of customers.
• Sales Analysis: Sales showed a positive trend and grew significantly from 2018 (3,353,014) to 2019 (4,669,925), suggesting that
the business performed well and experienced an increase in customer demand. However, since we only have sales data until May
for 2020 (1,737,283), it is difficult to make conclusive statements about the entire year.
• Customer Segmentation using RFM
analysis
• What is RFM?
• What all parameters used, and assumptions
made?
• Showcase the KNIME workflow image.
• What results are there in the output table
head?
RFM
• Recency: This is determined by subtracting the order date from a fixed reference date (e.g., 01-06-2020), indicating how recently a customer
made a purchase.
• Frequency: This is calculated as the count of purchases made by a customer, reflecting how often a customer makes purchases.
• Auto-binning: Customers are segmented into four categories based on their RFM scores: High, Moderate, Low, and Very Low. This
segmentation aids in categorizing customers according to their value and behavior.
• These parameters are utilized in RFM analysis to evaluate customer behavior, identify customer segments, and inform data-driven marketing
and sales decisions.
ASSUMPTIONS
• Higher Monetary Value Indicates Higher Spending: This assumption suggests that customers who generate higher monetary values through
their purchases are likely to be more valuable and potentially more profitable for the business.
• Recent Purchases Reflect Customer Engagement: It is assumed that customers who have made purchases more recently are likely to be more
engaged with the company and its offerings, presenting higher potential for repeat purchases or upselling/cross-selling opportunities.
• Higher Frequency of Purchases Reflects Customer Loyalty: This assumption posits that customers who make purchases more frequently
demonstrate greater loyalty to the company. They may have a stronger connection to the brand, higher customer satisfaction, and an
increased likelihood of recommending the company to others.
KNIME
WORKFLOW
Output table head
Inferences from RFM Analysis
and identified segments:
Double Decker Gift Stores, Ltd Very Low Very Low Very Low
West Coast Collectables Co. Very Low Very Low Very Low
Signal Collectibles Ltd. Very Low Very Low Very Low
Daedalus Designs Imports Very Low Very Low Very Low
CAF Imports Very Low Very Low Very Low