03. Returns Pattern Analysis
1. Overview and Scenario
Situation: Revenue is up but profits aren’t increasing. It turns out “Returns” are the culprit. The CFO brings the data with a serious look.
“The refund amount from returns last month is too high. Why are they returning products? Is it our fault or customer’s change of mind?”
Returns are a ‘necessary evil’ in e-commerce, but they can be managed and reduced through data. In this chapter, we’ll analyze return reasons, responsibility attribution, and category patterns to develop return prevention strategies.
2. Data Preparation
We’ll use the returns_reason_dummy table.
This table is connected to orders (order_id) and products (product_id).
BigQuery (SQL)
# BigQuery connection setup
from google.cloud import bigquery
client = bigquery.Client()Key Columns
reason_code: Return reason code (e.g.,size_issue,defect)responsibility: Responsibility attribution (customer,merchant,logistics)return_date: Return request date
3. Return Reason Analysis
First, let’s find out “why” customers are returning products.
❓ Problem 1: Distribution by Return Reason
Q. Find the top 5 most common return reasons and calculate the percentage (%) of each reason.
BigQuery (SQL)
Hint: Use GROUP BY reason_code and window function SUM(COUNT(*)) OVER().
View Solution
SELECT
reason_code,
COUNT(*) as return_count,
ROUND(COUNT(*) * 100.0 / SUM(COUNT(*)) OVER(), 2) as percentage
FROM `your-project-id.retail_analytics_us.returns_reason_dummy`
GROUP BY reason_code
ORDER BY return_count DESC
LIMIT 5;Key Reason Codes:
changed_mind: Simple change of minddefect: Product defectshipping_delay: Shipping delaysize_issue: Size problemdamaged: Damaged during shipping
4. Responsibility Analysis
The solution varies depending on who is responsible for the return.
- Customer: Change of mind, size mistake -> Improve product detail pages
- Merchant: Defects, wrong items shipped -> Strengthen quality inspection
- Logistics: Damage, delays -> Switch logistics providers or improve packaging
❓ Problem 2: Refund Amount by Responsibility
Q. Aggregate the total refund amount caused by returns by responsibility (responsibility).
(Note: Refund amount should use sale_price from the src_order_items table.)
BigQuery (SQL)
Hint: Use order_id and product_id as keys to JOIN with src_order_items.
View Solution
SELECT
r.responsibility,
COUNT(*) as return_count,
ROUND(SUM(oi.sale_price), 2) as total_refund_cost
FROM `your-project-id.retail_analytics_us.returns_reason_dummy` r
JOIN `your-project-id.retail_analytics_us.src_order_items` oi
ON r.order_id = oi.order_id
AND r.product_id = oi.product_id
GROUP BY r.responsibility
ORDER BY total_refund_cost DESC;5. Advanced: Return Patterns by Category
Not all categories are the same.
Intimates may have many “simple change of mind” returns, while Outerwear may have many “size issue” returns.
❓ Problem 3: Top Return Reasons by Category
Q. Find the top 5 categories with the most return counts, and find the #1 return reason for each category.
BigQuery (SQL)
Hint: Using APPROX_TOP_COUNT function makes it easy to find the mode within a group.
View Solution
SELECT
p.category,
COUNT(*) as return_count,
-- Extract the top 1 most frequent reason_code for each category
APPROX_TOP_COUNT(r.reason_code, 1)[OFFSET(0)].value as top_reason
FROM `your-project-id.retail_analytics_us.returns_reason_dummy` r
JOIN `your-project-id.retail_analytics_us.src_products` p
ON r.product_id = p.product_id
GROUP BY p.category
ORDER BY return_count DESC
LIMIT 5;💡 Summary and Insights
Analyzing returns data revealed surprising facts.
- Top Return Reasons: Simple change of mind (
changed_mind) and product defects (defect) are similarly high. - Responsibility: Return costs attributed to Logistics and Merchant are as significant as customer-attributed returns.
- Just addressing shipping delays/damage could reduce total return costs by 30%.
- Category Characteristics:
Jeans: Many size issues -> Need to improve size guideIntimates: Many simple change of mind -> Need to review hygiene-related return policies
Next Steps: We know customer satisfaction and return causes. Then, “How are all these metrics changing over time?” In the next chapter, we’ll identify trends through Time Series data analysis.