Skip to Content

01. Hypothesis Testing

Advanced2 hours

1. Overview and Scenario

Situation: There’s a rumor circulating in the team that “male customers spend more money than female customers.” The data team lead asks:

“Is that really true? Or is it just a coincidence because of a few big-spending male customers? Is it statistically significant?

We need to go beyond simply comparing averages and prove that the difference is not by chance. For this, we use T-test and ANOVA.


2. Data Preparation

Using the src_users, src_orders, and src_order_items tables, we calculate the total purchase amount per customer.

from google.cloud import bigquery from scipy import stats import pandas as pd client = bigquery.Client()

3. T-Test: Comparing Two Groups (Male vs Female)

This is the most common question: “Is the mean of A different from B?”

❓ Problem 1: Comparing Average Purchase Amount by Gender

Q. Perform an Independent Two-sample T-test to determine whether the difference in average total purchase amount between male and female customers is significant. (Only include customers with purchase history.)

Strategy: Aggregate data in BigQuery, bring it to Python, then use scipy.stats.

💡

Hint: Use GROUP BY user_id, gender in SQL to get total purchase amount, then fetch it.

View Solution

# 1. Extract data query = """ SELECT u.gender, SUM(oi.sale_price) as total_purchase FROM `your-project-id.retail_analytics_us.src_users` u JOIN `your-project-id.retail_analytics_us.src_orders` o ON u.user_id = o.user_id JOIN `your-project-id.retail_analytics_us.src_order_items` oi ON o.order_id = oi.order_id WHERE o.status NOT IN ('Cancelled', 'Returned') GROUP BY u.user_id, u.gender """ df = client.query(query).to_dataframe() # 2. Separate groups group_m = df[df['gender'] == 'M']['total_purchase'] group_f = df[df['gender'] == 'F']['total_purchase'] # 3. Perform T-test (not assuming equal variance -> Welch's t-test) t_stat, p_val = stats.ttest_ind(group_m, group_f, equal_var=False) print(f"Male average: ${group_m.mean():.2f}") print(f"Female average: ${group_f.mean():.2f}") print(f"T-statistic: {t_stat:.4f}") print(f"P-value: {p_val:.4f}") if p_val < 0.05: print("✅ The difference is statistically significant.") else: print("❌ The difference is not significant.")
실행 결과
Male average: $130.20
Female average: $114.75
T-statistic: 15.4704
P-value: 0.0000
✅ The difference is statistically significant.

💡 Interpretation Guide

  • P-value < 0.05: Accept the hypothesis that “the two group means are different” (significant difference).
  • P-value >= 0.05: There is not enough evidence to say there is a difference (could be by chance).
  • If the male average purchase amount is higher and the P-value is 0.0000, we can conclude that “men spend statistically more.”

4. ANOVA: Comparing Three or More Groups

What if there are three or more groups to compare? (e.g., by signup channel, by country) Instead of running multiple T-tests, we use ANOVA (Analysis of Variance).

❓ Problem 2: Purchase Amount Difference by Traffic Source

Q. Test whether there are differences in average total purchase amount among customers by traffic_source (Search, Display, Facebook, etc.).

💡

Hint: Since the src_users table doesn’t have a traffic_source column, we need traffic source data. Here, let’s assume traffic source info exists in src_users, or for practice, let’s use country. (Guide modification: Due to dataset limitations, we’ll substitute with Country comparison.)

Alternative Problem: Testing Purchase Amount Difference by Country

View Solution

# 1. Extract data query = """ SELECT u.country, SUM(oi.sale_price) as total_purchase FROM `your-project-id.retail_analytics_us.src_users` u JOIN `your-project-id.retail_analytics_us.src_orders` o ON u.user_id = o.user_id JOIN `your-project-id.retail_analytics_us.src_order_items` oi ON o.order_id = oi.order_id WHERE o.status NOT IN ('Cancelled', 'Returned') AND u.country IN ('United States', 'China', 'Japan') -- Compare 3 countries GROUP BY u.user_id, u.country """ df_anova = client.query(query).to_dataframe() # 2. Prepare data by group groups = [df_anova[df_anova['country'] == c]['total_purchase'] for c in df_anova['country'].unique()] # 3. Perform ANOVA f_stat, p_val = stats.f_oneway(*groups) print(f"F-statistic: {f_stat:.4f}") print(f"P-value: {p_val:.4f}")
실행 결과
F-statistic: 2.6464
P-value: 0.0709

5. Post-hoc Analysis

If ANOVA shows “there is a difference (P < 0.05)”, we need to know “specifically, which groups are different from each other?” For this, we perform the Tukey HSD test.

❓ Problem 3: Post-hoc Test by Country

Q. Use pairwise_tukeyhsd from statsmodels to identify which countries have clear differences.

View Solution

from statsmodels.stats.multicomp import pairwise_tukeyhsd # df_anova is the dataframe created above tukey = pairwise_tukeyhsd(endog=df_anova['total_purchase'], groups=df_anova['country'], alpha=0.05) print(tukey)
실행 결과
Multiple Comparison of Means - Tukey HSD, FWER=0.05
===========================================================
group1     group2    meandiff p-adj   lower   upper  reject
-----------------------------------------------------------
China         Japan   6.4094 0.1268  -1.3222 14.141  False
China United States   -1.217 0.6424   -4.398 1.9641  False
Japan United States  -7.6263 0.0596 -15.4896 0.2369  False
-----------------------------------------------------------

Result Interpretation:

  • Rows where reject column is True: The difference between the two groups is significant.
  • Example: If US vs China is True, the spending behavior between the two countries is different.

💡 Summary

  • T-test: Compare two groups (A vs B)
  • ANOVA: Compare three or more groups (A vs B vs C)
  • Post-hoc: Find the culprit - who is different from whom

In the next chapter, we will verify marketing performance through A/B Testing.

Last updated on

🤖AI 모의면접실전처럼 연습하기