Skip to Content

05. Customer Segmentation

Advanced3 hours

1. Overview and Scenario

Situation: Providing the same CS service to all customers is increasing costs while satisfaction stays flat. The CMO suggests:

“Shouldn’t we respond faster to VIP customers, and give coupons to customers who seem likely to churn?”

But we don’t know who’s a VIP and who’s at risk of churning. In this chapter, we’ll use RFM Analysis to segment customers and develop strategies for each group.


2. Data Preparation

We need order data (src_orders) and order details (src_order_items). We’ll connect these with CS ticket data (cs_tickets_dummy) for analysis.

# BigQuery connection setup from google.cloud import bigquery client = bigquery.Client()

3. Basic Segmentation: By Ticket Frequency

The easiest method is to look at “How often do they contact us?”

❓ Problem 1: Group by Ticket Frequency

Q. Calculate total tickets per customer and group them by the following criteria:

  • 1 time: 1 inquiry
  • 2-3 times: 2-3 inquiries
  • 4+ times: Frequent inquiries
💡

Hint: Use CASE WHEN to create groups.

View Solution

WITH user_tickets AS ( SELECT user_id, COUNT(*) as ticket_count FROM `your-project-id.retail_analytics_us.cs_tickets_dummy` GROUP BY user_id ) SELECT CASE WHEN ticket_count = 1 THEN '1 inquiry' WHEN ticket_count BETWEEN 2 AND 3 THEN '2-3 inquiries' ELSE '4+ (frequent)' END as frequency_segment, COUNT(*) as user_count FROM user_tickets GROUP BY 1 ORDER BY user_count DESC;

4. RFM Analysis (Recency, Frequency, Monetary)

This is the most commonly used marketing segmentation technique.

  • Recency: How recently did they purchase?
  • Frequency: How often did they purchase?
  • Monetary: How much did they spend?

❓ Problem 2: Calculate RFM Metrics per Customer

Q. Calculate each user’s Recency (days since last purchase), Frequency (total orders), and Monetary (total spend).

💡

Hint: Use DATE_DIFF and CURRENT_DATE() to calculate days elapsed.

View Solution

SELECT o.user_id, DATE_DIFF(CURRENT_DATE(), MAX(DATE(o.created_at)), DAY) as recency, COUNT(DISTINCT o.order_id) as frequency, ROUND(SUM(oi.sale_price), 2) as monetary FROM `your-project-id.retail_analytics_us.src_orders` o JOIN `your-project-id.retail_analytics_us.src_order_items` oi ON o.order_id = oi.order_id WHERE o.status NOT IN ('Cancelled', 'Returned') GROUP BY o.user_id LIMIT 10;

❓ Problem 3: Assign RFM Scores and Classify Segments

Q. Use NTILE(5) function to divide each metric into 1-5 scores and define segments with the following rules:

  • Champions: R, F, M all 4 points or higher
  • Loyal: R, F, M all 3 points or higher
  • At Risk: R 2 points or lower, F 3 points or higher (used to buy frequently but hasn’t come recently)
  • New: R 4 points or higher, F 2 points or lower (came recently but low frequency)
  • Others: The rest
💡

Hint: Use a CTE to calculate scores first, then classify with CASE WHEN.

View Solution

WITH rfm_base AS ( -- (Problem 2 query above) SELECT o.user_id, DATE_DIFF(CURRENT_DATE(), MAX(DATE(o.created_at)), DAY) as recency, COUNT(DISTINCT o.order_id) as frequency, SUM(oi.sale_price) as monetary FROM `your-project-id.retail_analytics_us.src_orders` o JOIN `your-project-id.retail_analytics_us.src_order_items` oi ON o.order_id = oi.order_id WHERE o.status NOT IN ('Cancelled', 'Returned') GROUP BY o.user_id ), rfm_scores AS ( SELECT *, NTILE(5) OVER (ORDER BY recency ASC) as r_score, -- The more recent (smaller), the better score should be -- Usually 5 points is best, so ORDER BY recency DESC would be correct. -- However, SQL NTILE(5) OVER (ORDER BY col) assigns 1 to smallest values. -- Recency: smaller (more recent) should be better -> Is rank 1 the best? -- For convenience, sort descending so larger values get 5 points -- or depends on problem definition. Here we use '5-point scale'. -- Modified: Configure logic so 5 is the best score NTILE(5) OVER (ORDER BY recency DESC) as r_score_raw, -- Old gets 1, recent gets 5 NTILE(5) OVER (ORDER BY frequency ASC) as f_score, -- More is 5 NTILE(5) OVER (ORDER BY monetary ASC) as m_score -- More is 5 FROM rfm_base ) SELECT user_id, CASE WHEN r_score_raw >= 4 AND f_score >= 4 AND m_score >= 4 THEN 'Champions' WHEN r_score_raw >= 3 AND f_score >= 3 AND m_score >= 3 THEN 'Loyal' WHEN r_score_raw <= 2 AND f_score >= 3 THEN 'At Risk' WHEN r_score_raw >= 4 AND f_score <= 2 THEN 'New' ELSE 'Others' END as segment FROM rfm_scores;

💡 Summary and Insights

Customers are all different and want to be treated differently.

  1. Top 20% create 80% of revenue (Pareto Principle): Give special treatment to Champions.
  2. Detect churn signals: Send win-back coupons to ‘At Risk’ group (used to buy frequently but became inactive).
  3. Resource optimization: Instead of urgently responding to all tickets, you can build a system that prioritizes VIP tickets.

Wrapping Up Part 1 Great work! You’ve now built a foundation in data analysis (exploration, metrics, time series, segmentation). Starting from the next chapter, we’ll dive deeper into Machine Learning (Sentiment Analysis) and AI.

Last updated on

🤖AI 모의면접실전처럼 연습하기