FreshMart Sales Analysis Report
FreshMart is a fast-growing grocery retail chain based in the United States, serving thousands of
customers across various cities and countries. Known for its wide product range and affordable pricing,
FreshMart has built a strong presence in both urban and suburban markets.
As the company prepares for its next phase of growth, leadership wants to focus not just on adding
new stores, but on increasing Total Sales Revenue from its existing network. This means a better
understanding what drives revenue, from which products perform well, to how different regions,
customer segments, and sales staff contribute to the bottom line.
OBJECTIVE:
The main goal of this project is to analyse FreshMart’s sales data over a four-month period to better
understand the factors influencing revenue. Rather than just expanding store count, the company is
looking to improve the performance of its existing operations. This involves exploring which products are
driving the most sales, how discounts affect purchase behaviour, and how different regions, customer
types, and sales employees contribute to overall revenue.
By diving deep into the data, the aim is to break down the Total Sales Revenue into meaningful parts—
such as product pricing, quantities sold, and discounts—to identify key patterns and areas with potential
for growth. This analysis will help provide a clearer picture of where the business is doing well and where
there is room for improvement, ultimately guiding FreshMart in making more informed, data-backed
decisions to boost revenue.
BUSINESS IMPACT:
This project offers FreshMart a clear, data-driven understanding of what truly drives sales across its
current operations. By identifying which product categories and customer segments bring in the most
revenue, the company can tailor its marketing and inventory strategies to meet demand more effectively.
The insights into regional and city-wise performance help pinpoint high-potential areas and flag
underperforming locations, allowing for targeted improvements and better resource allocation. Evaluating
employee contributions gives management the ability to recognise top performers and identify training
needs for others, improving overall team productivity.
Additionally, the analysis of discount effectiveness and customer buying patterns ensures that future
promotional strategies are not only attractive to customers but also profitable for the business.
DATASET OVERVIEW:
Dataset Name: FreshMart Dataset
Number of Rows: 6758125
Number of Columns: 28
Description: :The dataset used in this project contains sales and related information from
FreshMart, covering a period of four months. It is divided into seven CSV files, including monthly
sales data, product categories, regional details, and employee information.
COLUMN DEFINITIONS:
1. SalesID: Unique identifier for each sale transaction.
2. SalesPersonID: Unique identifier for the salesperson involved in the sale.
3. CustomerID:Uniqueidentifierforthecustomermakingthepurchase.
4. ProductID:Uniqueidentifierfortheproductsold.
5. Quantity: Number of units of the product sold.
6. Discount:Discountappliedtotheproduct(indecimalformat,e.g.,0.10for10%).
7. TotalPrice:Finalamountafterapplyingquantityanddiscount.
8. SalesDate: Date when the sale was made.
9. TransactionNumber:Identifierfortheoveralltransaction(couldgroupmultipleSalesIDs).
[Link]: Name of the product.
[Link]: Original price per unit of the product.
[Link]: Identifier for the product category.
[Link]: Class or type of the product (e.g., A, B, Premium, etc.).
[Link]: Last modification date of the sales record.
[Link]: Indicates if the product is resistant (e.g., to conditions or treatments).
[Link]: Indicates if the product has potential allergic reactions.
[Link]: Number of days the product remains effective or usable.
[Link]: Name of the product category.
[Link]: First name of the customer.
[Link]: Middle initial of the customer.
[Link]: Last name of the customer.
[Link]: Identifier for the customer's city.
[Link]: Full address of the customer.
[Link]: Name of the customer's city.
[Link]: Postal code of the customer’s address.
[Link]: Identifier for the customer's country.
[Link]: Name of the customer’s country.
[Link]: Standardized code for the country (e.g., IN, US).
.DATA CLEANING AND PREPARATION:
Before diving into the analysis, the raw data required careful cleaning and preparation to ensure accurate
results. This step was essential to remove any inconsistencies and make the dataset suitable for
meaningful analysis.
The following actions were taken during the cleaning process:
ImportingDataandrequiredpythonlibraries: Datasetsfromtheprovidedgoogledrivelinkwere
imported into Jupyter notebook for analysis. Also, all the necessary libraries such as NumPy, Pandas,
Matplotlib, Seaborn were installed and imported to clean, handle and prepare the data for analysis
and to further create visualizations.
DatasetOverview&Exploration:
Shape: the dataset contains 6758125 rows and 28 columns.
Data Type: The dataset contains values of various datatype like integer, float and object (text/string).Each
column was separately examined to check for data type and the same was corrected wherever was
required.
Duplicates: There were no duplicates found.
HandlingMissingValues:
The MiddleInitial column was dropped due to its high number of missing values and lack of analytical
importance. However, the SalesDate column, despite having missing entries, was retained as it is crucial
for time-based analysis. The missing values in SalesDate were handled separately to preserve the ability
to explore sales trends over time.
Dealing with Outliers:
Outliers in the TotalPrice column were identified using the IQR method. Since these high values likely
represent valid large transactions they were retained in the dataset to ensure an accurate representation
of total revenue.
TotalPrice:Outliersfound(49,129)andretained,astheyrepresentlargebutvalidtransactions.
PriceandQuantity:Nooutliersfound;valuesarewithinacceptablebusinessranges.
Discount:AlthoughIQRflaggedover1.3millionoutliers,theserepresentrealdiscountpercentages
and were retained for analysis.
Datatype Correction:
Converted the Quantity and VitalityDays columns from float to int because they represent whole numbers.
This ensures the data is more accurate and meaningful.
METRIC TREE:
UNIT PRICE
PRODUCT
LEVEL PRODUCT
METRICS
CLASS
QUANTITY
SOLD
CUSTOMER CUSTOMER
SEGMENTATION DEMOGHRAPHICS
ORDER VALUE
GROSS SEGMENT
MARGIN
TOTAL SALES
MADE
EMPLOYEE
PERFORMANCE
AVERAGE
SALES PER
TRANSACTION
TIME BASED MONTHLY
TRENDS
Exploratory Data Analysis (EDA):
Below are the observations from above Key Summary Statistics and the Correlation Matrix heatmap:
Categories Confections, Meat, and Poultry showed strong sales and could be prioritized in promotions
and inventory planning.
Monthly analysis revealed that sales peak in Month Jan, Mar, April and dip in Month May, indicating
seasonal influences.
Quantity & Price: Strong positive impact on revenue.
Discount: Slight negative effect.
Others: Low linear impact, possible non-linear influence.
HYPOTHESIS FORMULATING AND TESTING:
1. Higher discounts lead to higher revenue
Conclusion:
This contradicts the common assumption that discounts drive sales volume. Instead, the findings suggest
that:
FreshMart’scustomersarewillingtobuywithoutdiscounts.
Applyinghigherdiscountsmayactuallyreduceprofitabilitywithoutsignificantlyincreasingsales.
Recommendations :
1. Limitunnecessarydiscounts:Avoidapplyingbroaddiscounts,especiallybeyond10%,unlesstied
to specific promotions or surplus inventory.
2. Focusonproductvalue:Sincefull-pricesalesperformbetter,emphasizequality,healthbenefits.
3. Targetdiscounting:Ifdiscountsmustbeused,applythemstrategicallyonlow-movingproductsor
to boost new customer acquisition, not across the board.
1.1 Are discounts applied to low-priced products?
Result:
AvgPrice(Discounted):₹50.80
AvgPrice(Non-Discounted):₹50.83
Conclusion:
Discounted products are not significantly cheaper than non-discounted ones.
Therefore, price alone does not explain why discounts fail to generate more revenue. Other factors such
as quantity purchased or category demand might be more influential.
1.2 Are discounts used on low-quantity purchases?
Result:
AvgQuantity(Discounted):13.01
AvgQuantity(Non-Discounted):13.00
Conclusion:
There is no meaningful difference in quantity purchased between discounted and non-discounted orders.
This suggests that discounts do not influence customers to buy more or less. Therefore, discounts are not
effective at increasing order size — and may simply reduce revenue margins without changing customer
behavior.
1.3Are discounts used on low-demand product categories?
Conclusion:
Discounts are being applied on high-performing categories, not just underperforming ones.
This indicates that:
Evenpopularproductsarebeingdiscounted,buttheirdiscountedsalesmakeupasmallportionof
their total revenue.
Thus,discountingdoesnotsignificantlyboostperformance,eveninalreadysuccessfulcategories.
Final Recommendation:
FreshMart should reconsider blanket discounting, even for high-performing categories. Instead, use
discounts:
Onlyforspecificgoalslikeclearingseasonalinventoryorlaunchingnewproducts.
Pairedwithcustomersegmentation(e.g.,reward-baseddiscounts).
2. Do different product categories generate significantly different revenue?
Conclusion:
There is a significant difference in revenue generation across product categories. Some product lines
consistently outperform others.
Recommendations:
1. Prioritizetop-performingcategorieslikeConfections,Meat,andPoultryinpromotions,shelf
placement, and inventory planning.
2. Considerbundlingordiscountingstrategiesforunderperformingcategoriestoimprovetheir
contribution.
3. Alignmarketingeffortswithcategoriesthatalignwithseasonaldemandorcustomerpreferences
2.1 Do high-revenue categories have higher average product prices?
Key Insights:
The highest-priced category is Grain(₹61.43), yet it ranks lower in total revenue.
Confections, the top revenue generator, has a mid-range average price (₹51.81).
This suggests that high prices are not the main reason some categories earn more.
Conclusion:
Revenue differences are not primarily price-driven.
Other factors — such as sales quantity or number of transactions — likely explain why certain categories
outperform.
2.2 Do high-revenue categories sell higher quantities?
Key Insights:
Confections sold the highest number of units (11M+) and also earned the highest revenue. Other
top revenue categories like Meat, Poultry, and Cereals also had high quantities sold. In contrast,
Grain, despite having the highest average price, ranked low in quantity sold and total revenue.
Conclusion:
Revenue differences are strongly influenced by quantity sold.
Popular categories with high demand and volume outperform others, even if their product prices are
moderate.
2.3 Do high-revenue categories have more transactions?
Final Conclusion:
Revenue differences across product categories at FreshMart are driven primarily by sales volume and
customer demand, not by higher pricing.
Recommendations:
Focus on increasing transaction frequency and volume sold in lower-performing categories (e.g.,
Grain, Shell Fish).
For top performers (like Confections, Meat), ensure consistent stock, consider cross-selling, and
avoid unnecessary discounting.
Consider promotions that drive repeat purchases or larger basket sizes in mid-tier categories.
[Link] cities consistently generate more revenue than others.
Conclusion:
There is a clear difference in revenue across cities.
Top cities contribute significantly more, which highlights potential regional patterns in purchasing behavior.
Recommendation:
Investigate what drives success in top-performing cities (product mix, customer profiles, marketing).
Consider regional strategies — replicating top city approaches in underperforming ones.
3.1 High-revenue cities see larger orders on average.
Result:
Jackson ranked #1 with an average of 13.47 items per order.
Other high-revenue cities like San Antonio also ranked in the top 5.
However, top cities like Tucson and Sacramento were not in the top 10 by order size.
Conclusion:
While larger order quantities do help explain revenue in some cities, it's not a universal driver.
Other factors like number of transactions or product categories likely also play a key role.
3.2 Customers in these cities are buying premium or expensive products.
Result:
Jackson again ranked #1 with ₹51.13 average price.
Other high-price cities include Colorado and San Diego.
But most prices cluster tightly around ₹51, with minimal variation.
Conclusion:
There’s no significant variation in average price across cities.
So, product pricing does not explain revenue differences. Revenue is likely driven by transaction
volume and product mix.
3.3 These cities may be buying more of high-revenue categories (like Confections, Meat, etc.)
Conclusion:
The category mix is a key factor in city-level revenue performance.
Top-performing cities are strong not just in volume, but in the right categories — those that consistently
generate the most revenue.
Recommendation:
FreshMart should:
Expanddistributionandvisibilityoftopcategoriesinunderperformingcities.
Usethecategoryperformanceoftopcitiesasatemplateforstrategicplanningelsewhere
4. Sales are higher on specific days, weeks, or months
Conclusion:
There is clear monthly variation in revenue.
Revenue peaks in Jan and Mar, dips in Feb and Apr, and drops sharply in May.
Recommendation:
Investigate the cause of May's sharp decline: check data completeness or seasonal trends.
Consider boosting promotions in lower months to stabilize revenue flow.
Align inventory and staffing with high-demand periods (Jan & Mar).
4.1 Did people buy more items in some months?
Insights:
Revenue peaks in January and March correlate with the highest quantity sold.
The significant drop in May’s revenue is clearly due to the much lower sales volume.
This suggests that monthly sales quantity is a major driver of total sales revenue.
Recommendation:
Investigate why quantity sold in May dropped sharply
Consider running marketing campaigns or promotions in low-volume months to stabilize sales.
4.2 Do certain product categories dominate in certain months?
Insights:
Top-performing categories (Confections, Meat, Poultry) consistently drive most of the revenue.
In May, revenue across all major categories dropped sharply, contributing to the overall sales dip.
Recommendation:
Investigate potential seasonality, stock issues, or demand decline across all categories in May.
Consider balancing product mix or running category-specific promotions in low-performing months.
5. Revenue contribution is different across product Class (e.g., Premium vs Regular).
Conclusion:
All classes contribute substantially to total revenue.
High Class products contribute slightly more than others.
The differences suggest subtle variation in either pricing, volume, or customer preference.
Recommendation:
Consider analyzing profit margins per class to understand true profitability.
If High Class yields better margins, promoting premium lines may be beneficial.
If Low Class sells in higher volume, strategies like bundling or up-selling could be explored.
6. There is a positive correlation between quantity and total price per transaction.
Result:
Correlation Coefficient (r): 0.65
This shows a moderate to strong positive relationship between Quantity and TotalPrice.
Conclusion:
As customers buy more items, total revenue per transaction increases. However, the correlation is not
perfect, suggesting that discounts or item price variability play a secondary role in influencing total price.
Recommendation:
To boost revenue, FreshMart can encourage higher basket sizes through:
o Bundleoffers(Buy2Get1)
o Minimumordervaluediscounts(₹offon₹X+orders)
o Productrecommendationsduringcheckout
7. Customers who appear more frequently in the data contribute more to total sales.
Correlation Result:
CorrelationCoefficient: 0.0024
Conclusion:
The relationship between CustomerOrderCount and TotalPrice is negligible.
Frequent buyers do not significantly impact revenue more than infrequent buyers.
Some customers may place high-value single orders, while others may order often but in small
amounts.
Recommendation:
Segment customers by average order value, not just frequency.
Focus on identifying and retaining high-value customers rather than just frequent ones.
8. Customers who receive discounts tend to purchase in higher quantities
Insights:
Thedifferenceisextremelysmall(approx.0.009units).
Thisindicatesthatdiscountsdonotsignificantlyinfluencehowmuchcustomerspurchaseper
transaction.
Customerpurchasingbehaviorseemsconsistent,regardlessofwhetheradiscountisapplied.
There is no strong evidence that offering discounts leads to higher quantities purchased by individual
customers.
Recommendations:
Rethink discounting strategies: Since discounts don’t significantly increase quantity sold, offering
them broadly may not be cost-effective.
Consider:
o Personalizeddiscountsbasedonbuyinghistory.
o Offeringdiscountsonspecificcategoriesorduringseasonalevents.
9. Sales are influenced by the day of the week or show a weekend effect.
Interpretation:
-Weekday sales are noticeably higher than weekend sales.
HighestsalesoccuraroundWednesday,suggestingapossiblemid-weekpeak.
Salesduringweekends(Saturday&Sunday)areslightlylowerbutstillconsistent.
Nostrong"weekendboost"insalesisobserved.
Recommendations:
Focus marketing and promotional efforts during weekdays, especially mid-week.
If weekend sales are a priority, consider introducing special offers or discounts to stimulate
demand.
9.1 Customers tend to spend more per transaction on weekdays than on weekends
Insight:
Contrary to our assumption, weekend transactions show a slightly higher average revenue per transaction
than weekdays.
However, the difference is very small, suggesting that customer spending behavior is fairly consistent
across the week.
Recommendations:
[Link] weekday performance:
Introduce mid-week campaigns (like "Wednesday Wow Deals") to encourage weekday transactions and
drive up weekday revenue.
[Link] with time-based offers:
Offer different incentives for weekday mornings vs. evenings to explore time-based shopping patterns
further.
9.2Certain product categories perform better on weekends than weekdays.
Insight:
The data clearly shows that most product categories generate significantly higher revenue on weekends
than on weekdays. This indicates a weekend shopping surge.
Recommendations:
Launch special weekend deals for high-revenue categories like Confections, Meat, and Poultry.
Ensure better stock availability and staffing during weekends for top-selling categories.
Consider bundling offers (e.g., Meat + Confections combo) to increase basket size during
weekends.
10. Products marked as "IsAllergic" are sold less frequently than non-allergic products.
Insights:
Non-Allergic products had 2,466,289 sales records.
Allergic products had 2,346,657 sales records.
This shows a slight preference for non-allergic products, supporting the hypothesis.
Products marked as "Unknown" for allergy status were excluded from this comparison to maintain
clarity.
Recommendation:
Promote non-allergic products more heavily, as they already show a slight edge in customer
preference.
Consider adding clear allergy information to products marked as "Unknown" to improve transparency
and customer trust.
10.1 Allergic products have lower total revenue than non-allergic products
Insights:
Total revenue from Non-Allergic products: ₹ 1.53 billion
Total revenue from Allergic products: ₹ 1.64 billion
This result contradicts the hypothesis. Despite being sold slightly less frequently, allergic products
generate more revenue than non-allergic ones.
Recommendation:
Do not discontinue or deprioritize allergic products just because of slightly lower sales frequency.
Investigate which allergic products contribute the most to revenue and analyze their pricing,
categories, and customer segments.
10.2The product class (Class) differs significantly between allergic and non-allergic products.
Observations from Data:
High class products are more likely to be non-allergic.
Medium class products are more likely to be allergic.
Low class products are fairly balanced between allergic and non-allergic.
Recommendation:
Re-examine Medium-class products, as they are more likely to be allergic — consider
reformulation or improved labeling.
Use this insight to enhance product segmentation and targeted communication.
Promote High-class, non-allergic products for customers concerned about allergens.
10.3 Revenue contribution (TotalPrice) differs across product Class based on Allergy Status.
Insights:
In the Low and Medium classes, Allergic products generate more revenue.
In the High class,Non-Allergic products contribute more revenue.
Medium-class allergic products appear to be the top revenue drivers among all categories.
Recommendation:
Focus marketing and promotion on Medium-class allergic products — they generate high revenue
and may represent a strong product-market fit.
In High-class products, emphasize non-allergic options, as they dominate revenue in that segment.
[Link] sales per employee vary significantly between cities.
Insights:
There's clear variation in average employee performance across cities.
Top-performing cities like Tucson, Jackson, and Sacramento are significantly outperforming others,
indicating higher employee efficiency or possibly higher demand/productivity in those regions.
Recommendations:
Benchmark employee practices in high-performing cities and apply those learnings to
underperforming regions.
Conduct further analysis to understand factors influencing performance — such as product mix,
customer footfall, or store-level operations.
Align incentive structures with performance metrics in each city to motivate improved efficiency.
11.1 The product category mix in a city influences its average sales per employee
Insights:
Cities with the highest average sales per employee also show significant sales in the "Confections"
category.
Confections consistently contributes over 6 million in revenue across top-performing cities like
Tucson, Columbus,Jackson, and Fort Wayne.
Recommendation:
Promote or expand the availability of premium categories like Confections in cities with lower
employee performance to potentially boost sales per employee.
Evaluate city-level product mix regularly and adjust inventory or marketing strategies to prioritize
high-contribution categories.
Consider providing incentives to employees in cities with underperforming categories to improve
sales conversion rates.
11.2Employee productivity is higher in cities with higher average transaction counts per employee
Insight
Based on the analysis of average transactions and sales per employee:
Citieswiththehighestaveragetransactionsperemployeealsoshowhighaveragesalesper
employee.
Thisindicatesapositiverelationshipbetweenthenumberoftransactionshandledbyeach
employee and their total sales contribution.
Recommendation
Recognize and replicate the practices of top-performing cities like Tucson, Fort Wayne, and Columbus in
lower-performing areas.
Considerincentiveprogramsforemployeesincitieswithlowertransactionaveragestoboost
engagement and productivity.
Optimizestaffinglevelsbasedonexpectedtransactionvolumestomaintainhighperformanceper
employee.