Skip to Content

03. Returns Pattern Analysis

Intermediate2 hours

1. Overview and Scenario

Situation: Revenue is up but profits aren’t increasing. It turns out “Returns” are the culprit. The CFO brings the data with a serious look.

“The refund amount from returns last month is too high. Why are they returning products? Is it our fault or customer’s change of mind?”

Returns are a ‘necessary evil’ in e-commerce, but they can be managed and reduced through data. In this chapter, we’ll analyze return reasons, responsibility attribution, and category patterns to develop return prevention strategies.


2. Data Preparation

We’ll use the returns_reason_dummy table. This table is connected to orders (order_id) and products (product_id).

# BigQuery connection setup from google.cloud import bigquery client = bigquery.Client()

Key Columns

  • reason_code: Return reason code (e.g., size_issue, defect)
  • responsibility: Responsibility attribution (customer, merchant, logistics)
  • return_date: Return request date

3. Return Reason Analysis

First, let’s find out “why” customers are returning products.

❓ Problem 1: Distribution by Return Reason

Q. Find the top 5 most common return reasons and calculate the percentage (%) of each reason.

💡

Hint: Use GROUP BY reason_code and window function SUM(COUNT(*)) OVER().

View Solution

SELECT reason_code, COUNT(*) as return_count, ROUND(COUNT(*) * 100.0 / SUM(COUNT(*)) OVER(), 2) as percentage FROM `your-project-id.retail_analytics_us.returns_reason_dummy` GROUP BY reason_code ORDER BY return_count DESC LIMIT 5;

Key Reason Codes:

  • changed_mind: Simple change of mind
  • defect: Product defect
  • shipping_delay: Shipping delay
  • size_issue: Size problem
  • damaged: Damaged during shipping

4. Responsibility Analysis

The solution varies depending on who is responsible for the return.

  • Customer: Change of mind, size mistake -> Improve product detail pages
  • Merchant: Defects, wrong items shipped -> Strengthen quality inspection
  • Logistics: Damage, delays -> Switch logistics providers or improve packaging

❓ Problem 2: Refund Amount by Responsibility

Q. Aggregate the total refund amount caused by returns by responsibility (responsibility). (Note: Refund amount should use sale_price from the src_order_items table.)

💡

Hint: Use order_id and product_id as keys to JOIN with src_order_items.

View Solution

SELECT r.responsibility, COUNT(*) as return_count, ROUND(SUM(oi.sale_price), 2) as total_refund_cost FROM `your-project-id.retail_analytics_us.returns_reason_dummy` r JOIN `your-project-id.retail_analytics_us.src_order_items` oi ON r.order_id = oi.order_id AND r.product_id = oi.product_id GROUP BY r.responsibility ORDER BY total_refund_cost DESC;

5. Advanced: Return Patterns by Category

Not all categories are the same. Intimates may have many “simple change of mind” returns, while Outerwear may have many “size issue” returns.

❓ Problem 3: Top Return Reasons by Category

Q. Find the top 5 categories with the most return counts, and find the #1 return reason for each category.

💡

Hint: Using APPROX_TOP_COUNT function makes it easy to find the mode within a group.

View Solution

SELECT p.category, COUNT(*) as return_count, -- Extract the top 1 most frequent reason_code for each category APPROX_TOP_COUNT(r.reason_code, 1)[OFFSET(0)].value as top_reason FROM `your-project-id.retail_analytics_us.returns_reason_dummy` r JOIN `your-project-id.retail_analytics_us.src_products` p ON r.product_id = p.product_id GROUP BY p.category ORDER BY return_count DESC LIMIT 5;

💡 Summary and Insights

Analyzing returns data revealed surprising facts.

  1. Top Return Reasons: Simple change of mind (changed_mind) and product defects (defect) are similarly high.
  2. Responsibility: Return costs attributed to Logistics and Merchant are as significant as customer-attributed returns.
    • Just addressing shipping delays/damage could reduce total return costs by 30%.
  3. Category Characteristics:
    • Jeans: Many size issues -> Need to improve size guide
    • Intimates: Many simple change of mind -> Need to review hygiene-related return policies

Next Steps: We know customer satisfaction and return causes. Then, “How are all these metrics changing over time?” In the next chapter, we’ll identify trends through Time Series data analysis.

Last updated on

🤖AI 모의면접실전처럼 연습하기