0% found this document useful (0 votes)
21 views8 pages

Supply Chain Analysis for Cosmetics

The document outlines a study utilizing a dataset from Kaggle to analyze supply chain inefficiencies in a cosmetic start-up in India, focusing on their impact on pricing. It details the dataset's structure, cleaning processes, analytical methodologies, and statistical analyses employed to quantify relationships between operational inefficiencies and product pricing. The study aims to provide data-driven recommendations for optimizing costs and refining pricing strategies.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views8 pages

Supply Chain Analysis for Cosmetics

The document outlines a study utilizing a dataset from Kaggle to analyze supply chain inefficiencies in a cosmetic start-up in India, focusing on their impact on pricing. It details the dataset's structure, cleaning processes, analytical methodologies, and statistical analyses employed to quantify relationships between operational inefficiencies and product pricing. The study aims to provide data-driven recommendations for optimizing costs and refining pricing strategies.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

3.

Methodology
3.1 Data

3.1.1 Origin, Timeframe, and Relevance

The dataset was obtained from Kaggle (2025) and represents a cross-sectional
snapshot of the supply chain for a cosmetic start-up in India. It contains 100 records
(rows 2–101) and 24 variables (columns A–X), covering operations in Mumbai,
Kolkata, Delhi, Bangalore, and Chennai.

Since no explicit date fields are provided, the dataset is treated as a single
observation captured on March 1, 2025. Lead time (in days) is the only temporal
dimension, capturing order duration without introducing seasonal bias (Wooldridge,
2016).

The dataset enables the quantification of operational inefficiencies—extended lead


times, elevated shipping and manufacturing costs, and higher defect rates, and their
impact on pricing and profitability across haircare, skincare, and cosmetic portfolios.

3.1.2 Scope and Objectives

This study aims to:

●​ Quantify the relationship between internal supply-chain inefficiencies and


selling prices.​

●​ Identify which inefficiency factors exert the strongest influence on pricing


structures.​

●​ Provide data-driven recommendations to optimize costs and refine pricing


strategies.​

3.1.3 Structure and Definition of Variables

The dataset comprises 24 variables describing operational, logistical, and


commercial aspects of the supply chain. These variables are categorized as
categorical or numerical and are presented in Table 1 with their corresponding
descriptions and units of measurement.
Table 1. Definition of variables in the dataset

Field Type Description Unit


Haircare, Skincare,
Product type Categorical Product category Cosmetics
Unique alphanumeric
SKU Categorical identifier —
Price Numeric Sales price USD
Availability Numeric Units available in inventory units
Number of products sold Numeric Units sold units
Revenue generated Numeric Total revenue from sales USD
Male, Female, Non-binary,
Customer demographics Categorical Customer gender segment Unknown
Stock levels Numeric Current inventory level units
Time from order
Lead times Numeric placement to dispatch days
Quantity ordered per
Order quantities Numeric purchase units
Shipping times Numeric Duration of transportation days
Carrier A, Carrier B, Carrier
Shipping carriers Categorical Shipping company used C
Shipping costs Numeric Cost to ship USD
Supplier name Categorical Name of the supplier —
Mumbai, Kolkata, Delhi,
Location Categorical Supplier city Bangalore, Chennai
Lead time Numeric Fulfillment lead time days
Volume produced during
Production volumes Numeric the period units
Duration of the
Manufacturing lead time Numeric manufacturing process days
Manufacturing costs Numeric Cost of manufacturing USD
Outcome of quality
Inspection results Categorical inspection Pass, Fail, Pending
Proportion of defective
Defect rates Numeric units fraction (0–1)
Transportation modes Categorical Mode of transport Road, Rail, Air, Sea
Routes Categorical Transportation route used Route A, Route B, Route C
Total cost associated with
Costs Numeric the route USD
The variables encompass five main dimensions:

●​ Product and Sales – Product type, SKU, price, availability, number of products
sold, revenue generated, and customer demographics.​

●​ Inventory and Suppliers – Stock levels, lead times, and order quantities.​

●​ Shipping – Shipping times, carriers, and costs.​

●​ Production and Quality – Supplier name, location, production volumes,


manufacturing lead time, manufacturing costs, inspection results, and defect
rates.​

●​ Transportation – Transportation modes and routes, including associated


costs.​

3.1.4 Data Cleaning and Preparation

To ensure data quality and consistency, the following steps were applied (Little &
Rubin, 2002; Tukey, 1977):

●​ Duplicates – Verified SKU uniqueness; no records were removed.​

●​ Missing Values – Imputed using mode for categorical variables and median by
product type for numerical variables.​

●​ Outliers – Detected via boxplots for Price, Shipping Costs, and Lead Times;
records above the 99th percentile were excluded when they distorted
aggregated measures.​

●​ Data Types – Converted cost and price fields to numeric and durations to
integer values. Standardized labels for suppliers, carriers, and routes.​

●​ Derived Fields –​

○​ Total Cost per Unit = Shipping Costs + Manufacturing Costs​

○​ Defect Rate (%) = Defect Rate × 100​

○​ Gross Margin (%) = (Price – Total Cost per Unit) / Price × 100​
3.2 Analytical Methodology

3.2.1 Tools

Tableau Public was used for interactive visualizations and spatial analysis (Tableau
Software, 2024). Excel (Office 365) supported initial data profiling and calculation
verification.

3.2.2 Data Connection and Modeling

The dataset was imported into Tableau using the “Text File” connector. The Data
Interpreter tool was applied to clean headers and remove nested tables. Field names
were standardized, data types verified, and units annotated within the metadata.

3.2.3 Calculations and Parameters

Key calculated fields include:

●​ Total Cost = [Shipping Costs] + [Manufacturing Costs]​

●​ Defect Rate (%) = [Defect Rate] × 100​

●​ Gross Margin (%) = ([Price] – [Total Cost]) / [Price] × 100

A parameter named Metric was created to toggle views across Total Cost, Lead
Time, and Defect Rate, allowing flexible analysis of cost and process drivers.

3.2.4 Analytical Techniques and Visualizations

Visualization techniques were chosen to reveal cost-price relationships, assess


supplier performance, and identify operational inefficiencies (Kirk, 2016; Few, 2009).

●​ Cost–Price Relationship – Scatter plots with trend lines to explore price


elasticity and cost impact (Chambers et al., 1983).​

●​ Lead Time and Defect Impact – Bar charts and heat maps comparing lead
times by supplier and location, and scatter plots linking defect rates with
manufacturing costs (Kraak & Ormeling, 2010).​

●​ Route and Carrier Optimization – Boxplots of shipping costs and shipping


times by transportation mode and route to identify cost-saving opportunities.​

●​ Customer Insights and Pricing – Comparative bar charts and heat maps to
analyze revenue, sales volume, and product preferences by customer
demographics.​
Where appropriate, exploratory data analysis (Tukey, 1977) informed visualization
design and identification of key trends.

Dashboards & Storytelling

●​ Interactive dashboards with global filters (Product Type, Customer


Demographics, Location).
●​ A Tableau Story that guides the reader from high-level KPIs to specific
insights and operational-improvement recommendations (Dean, 2021).

Design and Interaction Notes

All worksheets include global filters (Product Type, Location, Customer


Demographics) to enable dynamic, segmented comparisons (Tableau Software,
2024).

Visual perception principles were applied: divergent color scales for defect rates,
sequential palettes for costs and lead-time metrics, and brand-consistent hues (Few,
2009).

Content is organized into a five-step storytelling dashboard that mirrors the analytical
blocks and culminates in strategic recommendations (Dean, 2021).

3.2.5 Statistical Analysis (Regression Model)

Objective Alignment

In Chapter 1, we stated our purpose “to quantify the relationship between supply
chain inefficiencies and product pricing” and “to identify which inefficiency factors
most significantly affect pricing” (Chapter 1.2). To deliver on that, we estimate
multivariate OLS regressions that link key operational drivers to Price.

Model Specifications: ​
• Linear Levels​
– Dependent variable: Price (USD) – Independents: Lead time (days),
Manufacturing lead time (days), Manufacturing costs (USD), Order quantities (units)
– Estimation: OLS with HC3 robust standard errors (Wooldridge, 2016)

• Log–Log​
– Same variables in natural logs, so coefficients are elasticities (Gould et al., 2013)
– Also estimated with HC3 robust errors

Key Results​
Table 3.2.5 summarizes the main coefficients, significance, and fit statistics.
Table 3.2.5 Regression Results (N = 100, HC3 SEs)

Model Predictor Coef. p-value 95 % CI Adj. R²


Linear Constant 619.174 <.001 [39.14, 84.70] 1.492
Lead time 5.191 1.224 [–0.14, 1.18]
Manufacturing lead
time –1.2444 <.001 [–1.88, –0.61]
Manufacturing
costs –0.2351 209 [–0.43, –0.04]
Order quantities 1.660 1.704 [–0.07, 0.40]
Log–Log Constant 3.1844 <.001 [1.62, 4.75] 1.036
ln Lead time 2.143 536 [–0.00, 0.43]
ln Manufacturing
lead time –0.3096 27 [–0.51, –0.11]
ln Manufacturing
costs –0.1107 4.045 [–0.37, 0.15]
ln Order quantities 2.605 713 [–0.02, 0.54]

Model fit comparison: Linear OLS (Adj. R²=0.1492, AIC=960.38) marginally outperforms the
Log–Log specification (Adj. R²=0.1036, AIC=283.51), so we adopt the level model for final inference.

Execution in Excel

1.​ Enabled the Data Analysis ToolPak (File → Options → Add-Ins → Analysis
ToolPak).
2.​ Launched Data Analysis → Regression.
3.​ Defined the Y Input Range as Price (or LN(Price)) and the X Input Range as
the set of inefficiency variables.
4.​ Checked “Labels,” “Residuals,” and “Residual Plots.”
5.​ Generated output—including coefficients, standard errors, t-stats, p-values,
R², and ANOVA—and exported it to a new sheet.
6.​ Calculated AIC and BIC manually:

A I C = 𝑛 ⋅ ln ⁡( S S E / 𝑛 ) + 2 𝑘 , B I C =𝑛 ⋅ ln ⁡( S S E / 𝑛 ) + 𝑘 ⋅ ln ⁡( 𝑛 )

, where 𝑛 = number of observations and 𝑘 = number of parameters.


Diagnostic Tests

To ensure valid inference, the following checks were performed in Excel:

Multicollinearity

●​ • Ran auxiliary regressions of each predictor on the remaining X’s.


●​ • Computed VIFj=1/(1−Rj2)\mathrm{VIF}_j = 1 / ( 1 − 𝑅 𝑗 2 ) .
●​ • Criterion: V I F < 5 indicates acceptable collinearity (Wooldridge, 2016).

Heteroskedasticity (Breusch–Pagan)

●​ • Saved residuals 𝜀 ^ 𝑖 , squared them, and regressed 𝜀 ^ 𝑖 2 on the original


X’s.
●​ • A significant p-value (< .05) triggers the use of robust standard errors.

Normality of Residuals

●​ • Created Q-Q plots via Excel chart tools. •


●​ Performed the Jarque–Bera test using skewness and kurtosis formulas.

Model Specification (RESET)

●​ • Added powers of the fitted values to an auxiliary regression.


●​ • Inspected joint significance of additional terms.

These results directly address our Chapter 1 objectives by quantifying which internal
inefficiencies most affect pricing and therefore should be prioritized in cost‐control
and pricing‐strategy initiatives.
REFERENCES

Chambers, J. M., Cleveland, W. S., Kleiner, B., & Tukey, P. A. (1983). Graphical
methods for data analysis. Wadsworth.

Dean, J. (2021). Storytelling with data: A data visualization guide for business
professionals. Wiley.

Gould, W. W., Pitblado, J. R., & Poi, B. P. (2013). Maximum Likelihood Estimation
with Stata (4th ed.). Stata Press.

Kaggle. (2025). Supply Chain Data for Cosmetic Startup in India. Recuperado de
[Link]

Kraak, M.-J., & Ormeling, F. (2010). Cartography: Visualization of spatial data (3ª
ed.). Guilford Press.

Little, R. J. A., & Rubin, D. B. (2002). Statistical analysis with missing data (2ª ed.).
Wiley.

Tableau Software. (2024). Tableau Public user guide [Manual de software]. Tableau
Software.

Tukey, J. W. (1977). Exploratory data analysis. Addison-Wesley.

Wooldridge, J. M. (2016). Introductory econometrics: A modern approach (6ª ed.).


Cengage Learning.

Van der Aalst, W. (2016). Process mining: Data science in action (2ª ed.). Springer

You might also like