3.
Methodology
3.1 Data
3.1.1 Origin, Timeframe, and Relevance
The dataset was obtained from Kaggle (2025) and represents a cross-sectional
snapshot of the supply chain for a cosmetic start-up in India. It contains 100 records
(rows 2–101) and 24 variables (columns A–X), covering operations in Mumbai,
Kolkata, Delhi, Bangalore, and Chennai.
Since no explicit date fields are provided, the dataset is treated as a single
observation captured on March 1, 2025. Lead time (in days) is the only temporal
dimension, capturing order duration without introducing seasonal bias (Wooldridge,
2016).
The dataset enables the quantification of operational inefficiencies—extended lead
times, elevated shipping and manufacturing costs, and higher defect rates, and their
impact on pricing and profitability across haircare, skincare, and cosmetic portfolios.
3.1.2 Scope and Objectives
This study aims to:
● Quantify the relationship between internal supply-chain inefficiencies and
selling prices.
● Identify which inefficiency factors exert the strongest influence on pricing
structures.
● Provide data-driven recommendations to optimize costs and refine pricing
strategies.
3.1.3 Structure and Definition of Variables
The dataset comprises 24 variables describing operational, logistical, and
commercial aspects of the supply chain. These variables are categorized as
categorical or numerical and are presented in Table 1 with their corresponding
descriptions and units of measurement.
Table 1. Definition of variables in the dataset
Field Type Description Unit
Haircare, Skincare,
Product type Categorical Product category Cosmetics
Unique alphanumeric
SKU Categorical identifier —
Price Numeric Sales price USD
Availability Numeric Units available in inventory units
Number of products sold Numeric Units sold units
Revenue generated Numeric Total revenue from sales USD
Male, Female, Non-binary,
Customer demographics Categorical Customer gender segment Unknown
Stock levels Numeric Current inventory level units
Time from order
Lead times Numeric placement to dispatch days
Quantity ordered per
Order quantities Numeric purchase units
Shipping times Numeric Duration of transportation days
Carrier A, Carrier B, Carrier
Shipping carriers Categorical Shipping company used C
Shipping costs Numeric Cost to ship USD
Supplier name Categorical Name of the supplier —
Mumbai, Kolkata, Delhi,
Location Categorical Supplier city Bangalore, Chennai
Lead time Numeric Fulfillment lead time days
Volume produced during
Production volumes Numeric the period units
Duration of the
Manufacturing lead time Numeric manufacturing process days
Manufacturing costs Numeric Cost of manufacturing USD
Outcome of quality
Inspection results Categorical inspection Pass, Fail, Pending
Proportion of defective
Defect rates Numeric units fraction (0–1)
Transportation modes Categorical Mode of transport Road, Rail, Air, Sea
Routes Categorical Transportation route used Route A, Route B, Route C
Total cost associated with
Costs Numeric the route USD
The variables encompass five main dimensions:
● Product and Sales – Product type, SKU, price, availability, number of products
sold, revenue generated, and customer demographics.
● Inventory and Suppliers – Stock levels, lead times, and order quantities.
● Shipping – Shipping times, carriers, and costs.
● Production and Quality – Supplier name, location, production volumes,
manufacturing lead time, manufacturing costs, inspection results, and defect
rates.
● Transportation – Transportation modes and routes, including associated
costs.
3.1.4 Data Cleaning and Preparation
To ensure data quality and consistency, the following steps were applied (Little &
Rubin, 2002; Tukey, 1977):
● Duplicates – Verified SKU uniqueness; no records were removed.
● Missing Values – Imputed using mode for categorical variables and median by
product type for numerical variables.
● Outliers – Detected via boxplots for Price, Shipping Costs, and Lead Times;
records above the 99th percentile were excluded when they distorted
aggregated measures.
● Data Types – Converted cost and price fields to numeric and durations to
integer values. Standardized labels for suppliers, carriers, and routes.
● Derived Fields –
○ Total Cost per Unit = Shipping Costs + Manufacturing Costs
○ Defect Rate (%) = Defect Rate × 100
○ Gross Margin (%) = (Price – Total Cost per Unit) / Price × 100
3.2 Analytical Methodology
3.2.1 Tools
Tableau Public was used for interactive visualizations and spatial analysis (Tableau
Software, 2024). Excel (Office 365) supported initial data profiling and calculation
verification.
3.2.2 Data Connection and Modeling
The dataset was imported into Tableau using the “Text File” connector. The Data
Interpreter tool was applied to clean headers and remove nested tables. Field names
were standardized, data types verified, and units annotated within the metadata.
3.2.3 Calculations and Parameters
Key calculated fields include:
● Total Cost = [Shipping Costs] + [Manufacturing Costs]
● Defect Rate (%) = [Defect Rate] × 100
● Gross Margin (%) = ([Price] – [Total Cost]) / [Price] × 100
A parameter named Metric was created to toggle views across Total Cost, Lead
Time, and Defect Rate, allowing flexible analysis of cost and process drivers.
3.2.4 Analytical Techniques and Visualizations
Visualization techniques were chosen to reveal cost-price relationships, assess
supplier performance, and identify operational inefficiencies (Kirk, 2016; Few, 2009).
● Cost–Price Relationship – Scatter plots with trend lines to explore price
elasticity and cost impact (Chambers et al., 1983).
● Lead Time and Defect Impact – Bar charts and heat maps comparing lead
times by supplier and location, and scatter plots linking defect rates with
manufacturing costs (Kraak & Ormeling, 2010).
● Route and Carrier Optimization – Boxplots of shipping costs and shipping
times by transportation mode and route to identify cost-saving opportunities.
● Customer Insights and Pricing – Comparative bar charts and heat maps to
analyze revenue, sales volume, and product preferences by customer
demographics.
Where appropriate, exploratory data analysis (Tukey, 1977) informed visualization
design and identification of key trends.
Dashboards & Storytelling
● Interactive dashboards with global filters (Product Type, Customer
Demographics, Location).
● A Tableau Story that guides the reader from high-level KPIs to specific
insights and operational-improvement recommendations (Dean, 2021).
Design and Interaction Notes
All worksheets include global filters (Product Type, Location, Customer
Demographics) to enable dynamic, segmented comparisons (Tableau Software,
2024).
Visual perception principles were applied: divergent color scales for defect rates,
sequential palettes for costs and lead-time metrics, and brand-consistent hues (Few,
2009).
Content is organized into a five-step storytelling dashboard that mirrors the analytical
blocks and culminates in strategic recommendations (Dean, 2021).
3.2.5 Statistical Analysis (Regression Model)
Objective Alignment
In Chapter 1, we stated our purpose “to quantify the relationship between supply
chain inefficiencies and product pricing” and “to identify which inefficiency factors
most significantly affect pricing” (Chapter 1.2). To deliver on that, we estimate
multivariate OLS regressions that link key operational drivers to Price.
Model Specifications:
• Linear Levels
– Dependent variable: Price (USD) – Independents: Lead time (days),
Manufacturing lead time (days), Manufacturing costs (USD), Order quantities (units)
– Estimation: OLS with HC3 robust standard errors (Wooldridge, 2016)
• Log–Log
– Same variables in natural logs, so coefficients are elasticities (Gould et al., 2013)
– Also estimated with HC3 robust errors
Key Results
Table 3.2.5 summarizes the main coefficients, significance, and fit statistics.
Table 3.2.5 Regression Results (N = 100, HC3 SEs)
Model Predictor Coef. p-value 95 % CI Adj. R²
Linear Constant 619.174 <.001 [39.14, 84.70] 1.492
Lead time 5.191 1.224 [–0.14, 1.18]
Manufacturing lead
time –1.2444 <.001 [–1.88, –0.61]
Manufacturing
costs –0.2351 209 [–0.43, –0.04]
Order quantities 1.660 1.704 [–0.07, 0.40]
Log–Log Constant 3.1844 <.001 [1.62, 4.75] 1.036
ln Lead time 2.143 536 [–0.00, 0.43]
ln Manufacturing
lead time –0.3096 27 [–0.51, –0.11]
ln Manufacturing
costs –0.1107 4.045 [–0.37, 0.15]
ln Order quantities 2.605 713 [–0.02, 0.54]
Model fit comparison: Linear OLS (Adj. R²=0.1492, AIC=960.38) marginally outperforms the
Log–Log specification (Adj. R²=0.1036, AIC=283.51), so we adopt the level model for final inference.
Execution in Excel
1. Enabled the Data Analysis ToolPak (File → Options → Add-Ins → Analysis
ToolPak).
2. Launched Data Analysis → Regression.
3. Defined the Y Input Range as Price (or LN(Price)) and the X Input Range as
the set of inefficiency variables.
4. Checked “Labels,” “Residuals,” and “Residual Plots.”
5. Generated output—including coefficients, standard errors, t-stats, p-values,
R², and ANOVA—and exported it to a new sheet.
6. Calculated AIC and BIC manually:
A I C = 𝑛 ⋅ ln ( S S E / 𝑛 ) + 2 𝑘 , B I C =𝑛 ⋅ ln ( S S E / 𝑛 ) + 𝑘 ⋅ ln ( 𝑛 )
, where 𝑛 = number of observations and 𝑘 = number of parameters.
Diagnostic Tests
To ensure valid inference, the following checks were performed in Excel:
Multicollinearity
● • Ran auxiliary regressions of each predictor on the remaining X’s.
● • Computed VIFj=1/(1−Rj2)\mathrm{VIF}_j = 1 / ( 1 − 𝑅 𝑗 2 ) .
● • Criterion: V I F < 5 indicates acceptable collinearity (Wooldridge, 2016).
Heteroskedasticity (Breusch–Pagan)
● • Saved residuals 𝜀 ^ 𝑖 , squared them, and regressed 𝜀 ^ 𝑖 2 on the original
X’s.
● • A significant p-value (< .05) triggers the use of robust standard errors.
Normality of Residuals
● • Created Q-Q plots via Excel chart tools. •
● Performed the Jarque–Bera test using skewness and kurtosis formulas.
Model Specification (RESET)
● • Added powers of the fitted values to an auxiliary regression.
● • Inspected joint significance of additional terms.
These results directly address our Chapter 1 objectives by quantifying which internal
inefficiencies most affect pricing and therefore should be prioritized in cost‐control
and pricing‐strategy initiatives.
REFERENCES
Chambers, J. M., Cleveland, W. S., Kleiner, B., & Tukey, P. A. (1983). Graphical
methods for data analysis. Wadsworth.
Dean, J. (2021). Storytelling with data: A data visualization guide for business
professionals. Wiley.
Gould, W. W., Pitblado, J. R., & Poi, B. P. (2013). Maximum Likelihood Estimation
with Stata (4th ed.). Stata Press.
Kaggle. (2025). Supply Chain Data for Cosmetic Startup in India. Recuperado de
[Link]
Kraak, M.-J., & Ormeling, F. (2010). Cartography: Visualization of spatial data (3ª
ed.). Guilford Press.
Little, R. J. A., & Rubin, D. B. (2002). Statistical analysis with missing data (2ª ed.).
Wiley.
Tableau Software. (2024). Tableau Public user guide [Manual de software]. Tableau
Software.
Tukey, J. W. (1977). Exploratory data analysis. Addison-Wesley.
Wooldridge, J. M. (2016). Introductory econometrics: A modern approach (6ª ed.).
Cengage Learning.
Van der Aalst, W. (2016). Process mining: Data science in action (2ª ed.). Springer