Target
Business Case
Topic: SQL
Duration: 2 week
Context
● Target is a globally renowned brand and a prominent retailer in the United States.
Target makes itself a preferred shopping destination by offering outstanding value,
inspiration, innovation and an exceptional guest experience that no other retailer can
deliver.
● This particular business case focuses on the operations of Target in Brazil and provides
insightful information about 100,000 orders placed between 2016 and 2018. The
dataset offers a comprehensive view of various dimensions including the order status,
price, payment and freight performance, customer location, product attributes, and
customer reviews.
● By analyzing this extensive dataset, it becomes possible to gain valuable insights into
Target's operations in Brazil. The information can shed light on various aspects of the
business, such as order processing, pricing strategies, payment and shipping efficiency,
customer demographics, product characteristics, and customer satisfaction levels.
Dataset:
[Link]
The data is available in 8 different csv files:
1. [Link]
2. [Link]
3. order_items.csv
4. [Link]
5. [Link]
6. [Link]
7. [Link]
8. [Link]
The column description for these csv files is given below.
The [Link] contain following features:
Features Description
customer_id ID of the consumer who made the purchase
customer_unique_id Unique ID of the consumer
customer_zip_code_prefix Zip Code of consumer’s location
customer_city Name of the City from where order is made
customer_state State Code from where order is made (Eg. são paulo - SP)
The [Link] contain following features:
Features Description
order_id A Unique ID of order made by the consumers
customer_id ID of the consumer who made the purchase
order_status Status of the order made i.e. delivered, shipped, etc.
order_purchase_timestamp Timestamp of the purchase
order_delivered_carrier_date Delivery date at which carrier made the delivery
order_delivered_customer_date Date at which customer got the product
order_estimated_delivery_date Estimated delivery date of the products
The order_items.csv contain following features:
Features Description
order_id A Unique ID of order made by the consumers
order_item_id A Unique ID given to each item ordered in the order
product_id A Unique ID given to each product available on the site
seller_id Unique ID of the seller registered in Target
shipping_limit_date The date before which the ordered product must be shipped
price Actual price of the products ordered
freight_value Price rate at which a product is delivered from one point to another
The [Link] contain following features:
Features Description
order_id A Unique ID of order made by the consumers
payment_sequential Sequences of the payments made in case of EMI
payment_type Mode of payment used (Eg. Credit Card)
payment_installments Number of installments in case of EMI purchase
payment_value Total amount paid for the purchase order
The [Link] contain following features:
Features Description
geolocation_zip_code_prefix First 5 digits of Zip Code
geolocation_lat Latitude
geolocation_lng Longitude
geolocation_city City
geolocation_state State
The [Link] contains following features:
Features Description
seller_id Unique ID of the seller registered
seller_zip_code_prefix Zip Code of the seller’s location
seller_city Name of the City of the seller
seller_state State Code (Eg. são paulo - SP)
The [Link] contain following features:
Features Description
review_id ID of the review given on the product ordered by the order id
order_id A Unique ID of order made by the consumers
review_score Review score given by the customer for each order on a scale
of 1-5
review_comment_title Title of the review
review_comment_message Review comments posted by the consumer for each order
review_creation_date Timestamp of the review when it is created
review_answer_timestamp Timestamp of the review answered
The [Link] contain following features:
Features Description
product_id A Unique identifier for the proposed project.
product_category_name Name of the product category
product_name_lenght Length of the string which specifies the name given to the
products ordered
product_description_lenght Length of the description written for each product ordered on
the site
product_photos_qty Number of photos of each product ordered available on the
shopping portal
product_weight_g Weight of the products ordered in grams
product_length_cm Length of the products ordered in centimeters
product_height_cm Height of the products ordered in centimeters
product_width_cm Width of the product ordered in centimeters
Dataset schema:
Problem Statement:
Assuming you are a data analyst/ scientist at Target, you have been assigned the task of
analyzing the given dataset to extract valuable insights and provide actionable
recommendations.
What does ‘good’ look like?
I. Import the dataset and do usual exploratory analysis steps like checking the
structure & characteristics of the dataset:
A. Data type of all columns in the “customers” table.
B. Get the time range between which the orders were placed.
C. Count the Cities & States of customers who ordered during the given period.
II. In-depth Exploration:
A. Is there a growing trend in the no. of orders placed over the past years?
B. Can we see some kind of monthly seasonality in terms of the no. of orders being
placed?
C. During what time of the day, do the Brazilian customers mostly place their
orders? (Dawn, Morning, Afternoon or Night)
● 0-6 hrs : Dawn
● 7-12 hrs : Mornings
● 13-18 hrs : Afternoon
● 19-23 hrs : Night
III. Evolution of E-commerce orders in the Brazil region:
A. Get the month on month no. of orders placed in each state.
B. How are the customers distributed across all the states?
IV. Impact on Economy: Analyze the money movement by e-commerce by looking at
order prices, freight and others.
A. Get the % increase in the cost of orders from year 2017 to 2018 (include
months between Jan to Aug only).
You can use the “payment_value” column in the payments table to get the cost
of orders.
B. Calculate the Total & Average value of order price for each state.
C. Calculate the Total & Average value of order freight for each state.
V. Analysis based on sales, freight and delivery time.
A. Find the no. of days taken to deliver each order from the order’s purchase date
as delivery time.
Also, calculate the difference (in days) between the estimated & actual delivery
date of an order.
Do this in a single query.
You can calculate the delivery time and the difference between the estimated &
actual delivery date using the given formula:
● time_to_deliver = order_delivered_customer_date -
order_purchase_timestamp
● diff_estimated_delivery = order_estimated_delivery_date -
order_delivered_customer_date
B. Find out the top 5 states with the highest & lowest average freight value.
C. Find out the top 5 states with the highest & lowest average delivery time.
D. Find out the top 5 states where the order delivery is really fast as compared to
the estimated date of delivery.
You can use the difference between the averages of actual & estimated delivery
date to figure out how fast the delivery was for each state.
VI. Analysis based on the payments:
A. Find the month on month no. of orders placed using different payment types.
B. Find the no. of orders placed on the basis of the payment installments that have
been paid.
Evaluation Criteria (100 points)
1. Initial exploration like checking the structure & characteristics of the data (15 points)
2. In-depth Exploration (15 points)
3. Evolution of E-commerce orders in the Brazil region (10 points)
4. Impact on Economy (20 points)
5. Analysis on sales, freight and delivery time (20 points)
6. Analysis based on the payments (10 points)
7. Actionable Insights & Recommendations (10 points)
Submission Process
Once you’re done with the case study...
● Use a Word document to paste your SQL queries along with a screenshot of the first 10
rows from the output.
● List down any valuable insights that you find during the analysis and provide some
action items from the company’s perspective in order to improve the current situation.
● Convert your solutions doc into a PDF, and upload the same on the platform.
● Please note that after submitting once, you will not be allowed to edit your submission.
Answer Sheet
1. Initial exploratory analysis:
We’ll first try to explore the data, understand it and answer some simple questions.
1.a. Data type of all columns in the “customers” table.
SELECT
column_name,
data_type
FROM `[Link].INFORMATION_SCHEMA.COLUMNS`
WHERE table_name = 'customers'
* INFORMATION_SCHEMA.COLUMNS is a special function in bigquery.
1.b. Get the time range between which the orders were placed.
SELECT
MIN(order_purchase_timestamp) AS first_order,
MAX(order_purchase_timestamp) AS last_order
FROM `[Link]`
1.c. Count the Cities & States of customers who ordered during the given period.
SELECT
COUNT(DISTINCT c.customer_city) AS city_cnt,
COUNT(DISTINCT c.customer_state) AS state_cnt
FROM `[Link]` o
INNER JOIN `[Link]` c
ON o.customer_id = c.customer_id
2. In-depth Exploration:
We’ll try to understand the trend in the data and see how things have changed for the
data that we have over the course of time.
2.a. Is there a growing trend in the no. of orders placed over the past years?
SELECT
EXTRACT(year FROM order_purchase_timestamp) AS year,
EXTRACT(month FROM order_purchase_timestamp) AS month,
COUNT(1) AS num_orders
FROM `[Link]`
GROUP BY year, month
ORDER BY year, month
2. b. Can we see some kind of monthly seasonality in terms of the no. of orders
being placed?
SELECT
Extract(month FROM order_purchase_timestamp) AS month,
COUNT(1) AS num_orders
FROM `[Link]`
GROUP BY 1
ORDER BY 1
In general we can see clearly that customers are more prone to buy things online than before.
2.c. During what time of the day, do the Brazilian customers mostly place their
orders? (Dawn, Morning, Afternoon or Night)
SELECT
CASE
WHEN Extract (hour FROM order_purchase_timestamp) BETWEEN 0 AND 6 THEN
'dawn'
WHEN Extract (hour FROM order_purchase_timestamp) BETWEEN 7 AND 12 THEN
'morning'
WHEN Extract (hour FROM order_purchase_timestamp) BETWEEN 13 AND 18 THEN
'afternoon'
WHEN Extract (hour FROM order_purchase_timestamp) BETWEEN 19 AND 23 THEN
'night'
END AS time_of_day,
COUNT(DISTINCT order_id) AS counter
FROM `[Link]`
GROUP BY 1
ORDER BY 2 DESC
Brazilian's customers tend to buy more in the afternoons.
3. Evolution of E-commerce orders in the Brazil region:
Now we’ll try to understand data based on state or city level and see what variations
are present and how the people in various states order and receive deliveries.
3.a. Get the month on month no. of orders placed in each state.
SELECT
Extract(month FROM order_purchase_timestamp) AS month,
c.customer_state,
COUNT(1) AS num_orders
FROM `[Link]` o
INNER JOIN `[Link]` c
ON o.customer_id = c.customer_id
GROUP BY c.customer_state, month
ORDER BY num_orders DESC
3.b. How are the customers distributed across all the states?
SELECT
c.customer_state,
COUNT(DISTINCT(c.customer_unique_id)) AS num_customers
FROM `[Link]` c
GROUP BY c.customer_state
ORDER BY num_customers DESC
4. Impact on Economy:
Until now, we just answered questions on the E-commerce scenario considering the
number of orders received. We could see the volumetry by a month, day of week, time
of the day and even the geolocation states.
Now, we will Analyze the money movement by e-commerce by looking at order
prices, freight and others.
4.a. Get the % increase in the cost of orders from year 2017 to 2018 (include months
between Jan to Aug only).
You can use the “payment_value” column in the payments table to get the cost of
orders.
WITH base_1 AS
(
SELECT * FROM `[Link]` a
INNER JOIN `[Link]` b
ON a.order_id = b.order_id
WHERE
EXTRACT(YEAR FROM a.order_purchase_timestamp) BETWEEN 2017 AND 2018
AND
EXTRACT(MONTH FROM a.order_purchase_timestamp) BETWEEN 1 AND 8
),
base_2 AS
(
SELECT
EXTRACT(YEAR FROM order_purchase_timestamp) AS year,
SUM(payment_value) AS cost FROM base_1
GROUP BY 1
ORDER BY 1 ASC
),
base_3 AS (
SELECT *, LEAD(cost, 1) OVER (ORDER BY year) AS next_year_cost FROM
base_2 )
SELECT *, (next_year_cost - cost)/ cost *100 AS percent_increase FROM
base_3
Breakdown and related queries:
● Create CTE Table and new columns:
○ price_per_order = sum(price)/count(order_id)
○ freight_per_order= sum(freight_value)/count(order_id)
○ Group the data on yearly and monthly level
with cte_table as (
select Extract( month from o.order_purchase_timestamp) as month,
Extract( year from o.order_purchase_timestamp) as year,
(sum(price)/count(o.order_id)) as price_per_order,
(sum(freight_value)/count(o.order_id)) as freight_per_order
from `[Link]` o
inner join `[Link].order_items` i
on o.order_id= i.order_id
group by year,month
)
select (price_per_order), (freight_per_order), month , year
from cte_table
4.a. Total amount sold in 2017 between Jan to august (Jan to Aug because data is available
starting 2017 01 to 2018 08) and we can only compare cycles with cycles
with cte_table as (
select
Extract( month from order_purchase_timestamp) as month,
Extract( year from order_purchase_timestamp) as year,
sum(price) as total_price,
sum(freight_value) as total_freight
from `[Link]` o
inner join `[Link].order_items` i
on o.order_id= i.order_id
group by year, month
)
select sum(total_price) as total_transaction_amt
from cte_table
where year =2017 and month between 1 and 8
(3.9M)
4.a. Total amount sold in 2018 between Jan to august
with cte_table as (
select
Extract( month from order_purchase_timestamp) as month,
Extract( year from order_purchase_timestamp) as year,
sum(price) as total_price,
sum(freight_value) as total_freight
from `[Link]` o
inner join `[Link].order_items` i
on o.order_id= i.order_id
group by year, month
)
select sum(total_price)
from cte_table
where year =2018 and month between 1 and 8
4.a. % increase from 2017 to 2018
Using another example (using orders and customers table)
select *, (orders-coalesce(lagger_orders,0))/coalesce(orders,1)*100 as difference from (
select *, lag (orders,1) over (order by year asc) as lagger_orders from (
select extract(year from a.order_purchase_timestamp) as year,
count(distinct a.order_id) as orders,
count(distinct b.customer_unique_id) as customers
from `[Link]` a
left join `[Link]` b
on a.customer_id=b.customer_id
group by 1
)base) base_2
order by year asc
4.b. Calculate the Total & Average value of order price for each state.
WITH cte_table AS (
SELECT
c.customer_state AS state,
SUM(price) AS total_price,
COUNT(DISTINCT(o.order_id)) AS num_orders
FROM `[Link]` o
INNER JOIN `[Link].order_items` i
ON o.order_id= i.order_id
INNER JOIN `[Link]` c
ON o.customer_id=c.customer_id
GROUP BY state
)
SELECT state,
total_price,
num_orders,
(total_price/num_orders) AS avg_price
FROM cte_table
ORDER BY total_price DESC
It's very interesting to see how some states have a high total amount sold and a low
price per order. If we look at SP (São Paulo) for example, it's possible to see that it is
the state with most valuable state for e-commerce (5202955 sold) but it is also where
customers pay less per order (125.75 per order)
4.c. Calculate the Total & Average value of order freight for each state.
WITH cte_table AS (
SELECT
c.customer_state AS state,
SUM(freight_value) AS total_freight,
COUNT(distinct(o.order_id)) as num_orders
FROM `[Link]` o
INNER JOIN `[Link].order_items` i
ON o.order_id= i.order_id
INNER JOIN `[Link]` c
ON o.customer_id=c.customer_id
GROUP BY state
)
SELECT
state,
total_freight,
num_orders,
(total_freight/num_orders) AS avg_freight
FROM cte_table
ORDER BY total_freight DESC
5. Analysis based on sales, freight and delivery time.
5.a. Find the no. of days taken to deliver each order from the order’s purchase date
as delivery time.
Also, calculate the difference (in days) between the estimated & actual delivery date
of an order.
You can calculate the delivery time and the difference between the estimated & actual
delivery date using the given formula:
● time_to_deliver = order_delivered_customer_date - order_purchase_timestamp
● diff_estimated_delivery = order_estimated_delivery_date -
order_delivered_customer_date
SELECT
order_id,
TIMESTAMP_DIFF(order_delivered_customer_date, order_purchase_timestamp,
DAY) AS time_to_dil,
TIMESTAMP_DIFF(order_delivered_customer_date,
order_estimated_delivery_date, DAY) AS diff_estimated_dil
FROM `[Link]`
WHERE order_status='delivered'
5.b. Find out the top 5 states with the highest & lowest average delivery time.
SELECT
c.customer_state as state,
AVG(freight_value) AS total_freight
FROM `[Link]` o
INNER JOIN `[Link].order_items` i
ON o.order_id= i.order_id
INNER JOIN `[Link]` c
ON o.customer_id=c.customer_id
GROUP BY state
ORDER BY total_freight DESC
LIMIT 5
5.c. Find out the top 5 states with the highest & lowest average freight value.
SELECT
c.customer_state as state,
SUM(TIMESTAMP_DIFF(order_delivered_customer_date,
order_purchase_timestamp, DAY))/COUNT(ORDER_ID) AS avg_dil_time,
FROM `[Link]` o
INNER JOIN `[Link]` c
ON o.customer_id=c.customer_id
WHERE order_status='delivered'
GROUP BY state
ORDER BY avg_dil_time
LIMIT 5
5.d. Find out the top 5 states where the order delivery is really fast as compared to
the estimated date of delivery.
You can use the difference between the averages of actual & estimated delivery date
to figure out how fast the delivery was for each state.
SELECT
customer_state AS state,
ROUND(SUM(TIMESTAMP_DIFF(order_delivered_customer_date,
order_purchase_timestamp, DAY))/COUNT(ORDER_ID), 2) AS
average_time_for_del,
ROUND(SUM(TIMESTAMP_DIFF(order_estimated_delivery_date,
order_purchase_timestamp, DAY))/COUNT(ORDER_ID), 2) AS
average_est_dil_time,
FROM `[Link]` o
INNER JOIN `[Link]` c
ON o.customer_id=c.customer_id
WHERE order_status='delivered'
GROUP BY customer_state
ORDER BY (average_time_for_del-average_est_dil_time)
6. Analysis based on the payments:
6.a. Find the month on month no. of orders placed using different payment types.
SELECT
payment_type,
COUNT(o.order_id) AS order_count,
Extract(month FROM order_purchase_timestamp) AS month,
Extract(year FROM order_purchase_timestamp) AS year,
FROM `[Link]` p
JOIN `[Link]..orders` o
ON o.order_id=p.order_id
GROUP BY payment_type, year, month
ORDER BY year, month
6.b. Find the no. of orders placed on the basis of the payment installments that have
been paid.
SELECT
payment_installments AS installments,
COUNT(order_id) AS num_orders,
FROM `[Link]`
WHERE payment_installments>=1
GROUP BY payment_installments
ORDER BY num_orders DESC