Chapter 3: Methodology
This section outlines the design of the method followed in carrying out the Experiential Learning
Project (ELP) with Dipitt (ITT Foods). The project involved operational
performance diagnosis, sales forecasting, and cost optimization opportunities through data
processing, modeling, and visualization. This section outlines the data handling strategy, analysis
and forecasting methods, ethical considerations, and how secondary research facilitated industry
benchmarking.
3.1 Data Collection and Source Integration
The analysis was based on three primary datasets provided by the company supervisor, Mr. Uzair
Naeem. The analysis drew from three main sources of data which included sales transaction
records spanning December 2022 to June 2024 with approximately 61,351 rows, logistics cost
data lacking SKU-level details, and a customer master file. The first validation found data
inconsistencies for key fields like city, region, and channel which affected large customers such
as “Burger O’ Clock” due to their incorrect or absent location details that impaired segmentation
precision.
The data inconsistencies were addressed by requesting a validated customer mapping file that
included standardized fields such as customer name, city, region, country, and channel. Merge
functions were employed from Python along with Excel's VLOOKUP to correct and standardize
the raw sales data. The clean integration process reduced error propagation which enabled
consistent performance evaluations at regional and channel levels. This mapping practice was
recommended as an ERP enhancement for dropdown-based automation.
3.2 Data Cleaning and Transformation
The data preprocessing steps were executed through Python in Google Colab and Jupyter
Notebook using Pandas and NumPy libraries. The unified dataset allowed to fill in missing
values uniformly which established a strong base for subsequent analysis. The 'Note' column in
the sales dataset presented challenges because it had free-text descriptions of adjustments like
‘discount’, ‘commission’ or ‘FOC’ that lacked standardization and inconsistent labeling across
entries. To resolve this, a custom Python script was developed to classify entries into structured
categories: Discounts & Promotions, Rent Claims, Write-offs, and Miscellaneous. The enhanced
understanding of revenue erosion origins now justifies the future recommendation to replace
ERP input fields with dropdown menus for standardization. DC Numbers were used as a
mapping tool to connect logistics cost data entries to their corresponding sales transactions
because SKU-level identifiers were missing. A quantity-based allocation method was then
applied. The dispatch of 10 units of Ketchup and 30 units of Mayo resulted in 25% of total costs
being allocated to Ketchup with 75% being allocated to Mayo. SKU-level profitability was
estimated using this approach. The methodology received approval from both Dipitt’s supervisor
and the faculty advisor who confirmed its validity under current constraints.
3.3 Analytical Tools and Forecasting Platforms
Spreadsheet software together with coding platforms and the combination of forecasting models
and data visualization tools was used to perform the analysis. The primary functions of Microsoft
Excel included file inspection and data reconciliation through VLOOKUP operations. Python
was the primary platform for data preprocessing and model development.
The forecasting approach involved Meta’s Prophet model because it handles missing data points
effectively while detecting seasonality patterns to produce straightforward trend-based
predictions that require minimal data preparation (Taylor & Letham, 2018; Peters, 2025). Using
monthly sales data for training enabled the model to predict revenue until 2027. Ratio based
extrapolation techniques were applied in cases where historical data was incomplete for variables
such as logistic cost and product returns.
The use of interactive dashboards in Power BI enabled the visualization of essential metrics in
various regions along with products and channels. These dashboards allowed stakeholders to
track business performance while testing various cost and volume input scenarios
instantaneously. The visualization approach adopts current industry trends that emphasize data-
driven decisions throughout Pakistan’s FMCG sector (Maersk, 2023).
3.4 Dashboard Development and KPI Alignment
Two main dashboards were developed which included a Sales Performance Dashboard and a
Logistics Cost Dashboard. The Sales Dashboard tracked both revenue patterns and adjustment
breakdowns along with return rates and SKU performance. The data showed clear seasonal sales
patterns with a noticeable decline during Ramadan in April and increased sales in March. The
dashboard showed Ketchup, Chili Garlic, and Classic Mayo as products with steady high
performance.
The Logistics Dashboard showed that cost-to-sales ratios increased from 1.58% in 2023 to
2.76% in 2024 due in part to rising reverse logistics expenses for products like BBQ Ranch and
Teriyaki Sauce. The analysis revealed that 12-unit and 24-unit packaging options yielded
maximum cost efficiency for delivery operations and unit profit margins. Company supervisor
feedback validated these dashboards ensuring they matched Dipitt’s internal performance KPIs.
The system will provide continuous support for both monitoring processes and strategic
decisions after project completion.
3.5 Role of Secondary Research
Industry sources provided necessary benchmarking and contextual alignment to support the
primary data analysis. Reports from Euromonitor International (2024), StrategyHelix (2024), and
Verdict Foodservice (2024) provided confirmation for Dipitt’s SKU strategy and pricing within
the context of general market patterns.
The Aurora (2024) report alongside National Foods’ 2023 investor briefing revealed information
about the changing customer choices and actions taken by competitors. The introduction of
National Foods’ “Drizz’l” product led to suggest increasing Gen Z-targeted SKUs including
Buffalo and BBQ sauces through online sales channels.
Prophet was selected as the forecasting model for FMCG time-series data based on information
from technical sources Peters (2025) and ResearchGate (2020). The official website of Dipitt
(2025) revealed details about their export footprint which helped the sales division to segment
markets regionally and globally. The depth, credibility, and strategic relevance of the analysis
was improved with the help of these secondary resources.
3.6 Ethical Considerations
Strict ethical standards were maintained throughout the project duration. Dipitt’s authorized
representative provided all datasets which were used only for academic research purposes. The
study did not use personal or respondent-level data which minimized privacy risks.
The approach to subjective data transformation like adjustment classification-maintained
transparency through rule-based procedures. The selection process for models and tools focused
on achieving high levels of explainability and analytical validity. The overall methodology
maintained corporate confidentiality while ensuring academic integrity and responsible data use.
3.7 Alignment with ELP Objectives
The employed methodology provided direct support for all three primary objectives set by the
ELP Terms of Reference. Performance diagnosis was performed by conducting comprehensive
data cleaning followed by segmentation and adjustment analysis. Forecasting objectives were
achieved through the application of Prophet for sales trend modeling and revenue projection.
Cost optimization insights were delivered by analyzing SKU logistics allocation and return cost
mapping alongside packaging evaluations.
By combining these approaches Dipitt gained practical data-based insights that matched its
existing operational and strategic objectives.
3.8 Methodological Reflection: Opportunities with Enhanced Data Access
Employed methodology proved effective given the conditions but multiple limitations prevented
complete detailed profitability and optimization studies. The most significant restriction faced
was the missing true production cost data (COGS) which included costs for ingredients, labor,
and packaging. A gross margin model for each SKU would have enabled to assess contribution
margins by different regions, channels, and formats with greater precision which would have
improved the cost forecast.
The absence of SKU-level logistic costs constrained the capability to implement activity-based
costing (ABC). Detailed logistics inefficiency models and targeted solutions at regional or SKU
levels would have been created if shipment weight data or other cost drivers such as packaging
size and delivery routes would have been provided.
The implementation of unstructured adjustment notes for ERP exports obstructed real-time
classification processes and return prediction accuracy. Structured tags or dropdown fields would
have allowed to use automated classification models to identify irregularities and produce real-
time reports. Such models could facilitate the development of return-risk frameworks based on
channels, products, or regions.
Better data granularity would have facilitated the application of sophisticated forecasting models.
Implementing Prophet alongside SARIMA or XGBoost models trained on data about product
returns, discounts, and channel effects would have provided SKU-specific forecasts for different
commercial scenarios. Cross-channel elasticity would have been modeled to examine the effect
of E-Commerce discounts on Food Solutions volume or the impact of IMT returns on regional
profitability.
The introduction of scenario simulation functions within Power BI would have given Dipitt’s
leadership the ability to explore various cost structures and discount levels along with packaging
options before implementation which would have led them from basic performance tracking into
advanced strategic forecasting. The potential applications of these datasets highlight the
importance of ERP data that is both cleaner and more granular while being fully integrated to
enable advanced business intelligence analysis.