02. A/B Testing (Experiments)
1. Overview and Scenario
Situation: The marketing team created a new landing page (Version B). Theyāre excited, saying āThe conversion rate increased by 2% compared to the original page (Version A)!ā
But you calmly ask:
āHow large was the sample size? Whatās the probability (P-value) that this 2% increase is just by chance?ā
A/B testing is the crown jewel of business decision-making. In this chapter, we learn Proportions Z-test and Sample Size Calculation (Power Analysis).
2. Data Preparation
Since we donāt have A/B test log data, weāll simulate using existing data by assuming gender as A/B groups.
- Group A: Male
- Group B: Female
- Conversion: Purchase status (1 if order history exists, 0 otherwise)
BigQuery (SQL)
from statsmodels.stats.proportion import proportions_ztest
import numpy as np
# ... BigQuery client setup3. Proportions Z-test
Used when comparing proportions like conversion rates.
ā Problem 1: Comparing Conversion Rates Between Groups
Q. Calculate the purchase conversion rate (proportion of purchasers among all registered users) for males and females, and test whether the difference in proportions is significant.
BigQuery + Python
Hint: Use COUNT(DISTINCT user_id) to get the total population, and LEFT JOIN orders to count purchasers.
View Solution
# 1. Aggregate data
query = """
SELECT
u.gender,
COUNT(DISTINCT u.user_id) as total_users,
COUNT(DISTINCT o.user_id) as purchasers
FROM `your-project-id.retail_analytics_us.src_users` u
LEFT JOIN `your-project-id.retail_analytics_us.src_orders` o ON u.user_id = o.user_id
GROUP BY u.gender
"""
df = client.query(query).to_dataframe().set_index('gender')
# 2. Extract statistics (number of successes, number of trials)
count = df['purchasers'].values # [male purchasers, female purchasers]
nobs = df['total_users'].values # [total males, total females]
# 3. Z-test
z_stat, p_val = proportions_ztest(count, nobs)
print(f"Male conversion rate: {count[0]/nobs[0]:.4f}")
print(f"Female conversion rate: {count[1]/nobs[1]:.4f}")
print(f"P-value: {p_val:.4f}")
if p_val < 0.05:
print("ā
The difference in conversion rates is significant.")Error: name 'client' is not defined
4. Sample Size Calculation (Sample Size & Power)
This is the first question you should ask before running a test:
āHow many people do we need to experiment on to get reliable results?ā
If itās too few, you might miss an effect even if it exists (False Negative). If itās too many, youāre wasting money.
ā Problem 2: Calculate Required Sample Size
Q. If the current conversion rate is 10%, how many people per group are needed to detect an improvement to 11% (1%p increase)? (Based on significance level , Power=0.8)
Python (Common)
Hint: Use statsmodels.stats.power.NormalIndPower or calculate proportion_effectsize.
View Solution
import statsmodels.stats.api as sms
from statsmodels.stats.proportion import proportion_effectsize
# 1. Calculate Effect Size
p1 = 0.10 # baseline
p2 = 0.11 # target
effect_size = proportion_effectsize(p1, p2)
# 2. Calculate sample size
required_n = sms.NormalIndPower().solve_power(
effect_size=effect_size,
power=0.8,
alpha=0.05,
ratio=1
)
print(f"Required sample size per group: {int(np.ceil(required_n))} people")
print(f"Total required sample size: {int(np.ceil(required_n)) * 2} people")Required sample size per group: 14745 people Total required sample size: 29490 people
š” Parameter Explanation
- Alpha (): Type I error probability (usually 0.05). āProbability of saying thereās an effect when there isnātā
- Power (): Statistical power (usually 0.8). āProbability of finding an effect when it truly existsā
- Effect Size: The magnitude of the difference you want to detect (larger = fewer samples needed)
5. Experiment Design Pitfalls (Common Pitfalls)
Design is just as important as coding.
-
Peeking Problem:
- You shouldnāt say āOh? P-value is 0.04, letās stop!ā in the middle of an experiment.
- You must wait until you reach the predetermined sample size (N).
-
SRM (Sample Ratio Mismatch):
- You split 50:50, but the result is 1000 vs 950 people?
- Thereās a bug in the traffic allocation system, or data loss occurred in a specific group.
- Test results are invalid!
š” Summary
- Proportions Test: Compare 0/1 data like click rates and conversion rates
- Power Analysis: Essential step before starting an experiment (āHow many do we need?ā)
- A/B testing is science: Make decisions with data, not feelings.
In the next chapter, weāll explore hidden relationships between variables through Correlation and Regression Analysis.