A/B Testing
IntermediateAdvanced
Learning Objectives
- A/B Test Design
- Sample Size Calculation
- Conversion Rate/Mean Comparison Tests
- Results Interpretation
0. Setup
Load CSV files for data practice.
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from scipy import stats
# Load Data
orders = pd.read_csv('src_orders.csv', parse_dates=['created_at'])
items = pd.read_csv('src_order_items.csv')
products = pd.read_csv('src_products.csv')
users = pd.read_csv('src_users.csv')
# Merge for Analysis
df = orders.merge(items, on='order_id').merge(products, on='product_id').merge(users, on='user_id')
# Simulate AB Test Data for Examples
np.random.seed(42)
df['experiment_group'] = np.random.choice(['control', 'treatment'], size=len(df))
# Add slight effect to treatment
df.loc[df['experiment_group'] == 'treatment', 'sale_price'] *= 1.05
df['order_amount'] = df['sale_price'] # Alias for example1. What is A/B Testing?
Definition
A/B testing is an experiment that compares two versions (A: control, B: treatment) to verify which one is more effective.
Use Cases
- Website button colors
- Email subject lines
- Pricing policies
- Recommendation algorithms
2. Sample Size Calculation
Power Analysis
Calculate the required sample size before the experiment.
from statsmodels.stats.power import TTestIndPower
# Parameters
effect_size = 0.2 # Effect size (small: 0.2, medium: 0.5, large: 0.8)
alpha = 0.05 # Significance level
power = 0.8 # Statistical power
# Sample size calculation
analysis = TTestIndPower()
sample_size = analysis.solve_power(
effect_size=effect_size,
power=power,
alpha=alpha,
alternative='two-sided'
)
print(f"Required sample size (per group): {int(sample_size)}")
print(f"Total required sample: {int(sample_size * 2)}")실행 결과
Required sample size (per group): 393 Total required sample: 786
3. Conversion Rate Comparison (Proportion Test)
Z-test
from statsmodels.stats.proportion import proportions_ztest
# Data
# Group A: 50 conversions out of 1000
# Group B: 65 conversions out of 1000
conversions = [50, 65]
n_observations = [1000, 1000]
# Z-test
z_stat, p_value = proportions_ztest(conversions, n_observations, alternative='two-sided')
# Conversion rates
rate_a = conversions[0] / n_observations[0]
rate_b = conversions[1] / n_observations[1]
lift = (rate_b - rate_a) / rate_a * 100
print(f"Group A conversion rate: {rate_a:.2%}")
print(f"Group B conversion rate: {rate_b:.2%}")
print(f"Relative increase: {lift:.1f}%")
print(f"p-value: {p_value:.4f}")
if p_value < 0.05:
print("→ Group B is significantly better!")
else:
print("→ No significant difference")실행 결과
Group A conversion rate: 5.00% Group B conversion rate: 6.50% Relative increase: 30.0% p-value: 0.1496 → No significant difference
4. Mean Comparison (t-test)
Order Amount Comparison
from scipy import stats
# Group data
group_a = df[df['experiment_group'] == 'control']['order_amount']
group_b = df[df['experiment_group'] == 'treatment']['order_amount']
# t-test
t_stat, p_value = stats.ttest_ind(group_a, group_b)
print(f"Group A mean: ${group_a.mean():.2f}")
print(f"Group B mean: ${group_b.mean():.2f}")
print(f"Difference: ${group_b.mean() - group_a.mean():.2f}")
print(f"p-value: {p_value:.4f}")
# Effect size (Cohen's d)
pooled_std = np.sqrt((group_a.std()**2 + group_b.std()**2) / 2)
cohens_d = (group_b.mean() - group_a.mean()) / pooled_std
print(f"Cohen's d: {cohens_d:.3f}")실행 결과
Group A mean: $59.49 Group B mean: $62.97 Difference: $3.48 p-value: 0.0000 Cohen's d: 0.050
5. Results Interpretation
Decision Framework
1. Is p < 0.05?
- No → No significant difference, additional experimentation needed
- Yes → Proceed to next step
2. Is the effect size practically meaningful?
- 0.1% increase vs 10% increase in conversion rate
- Calculate business impact
3. Is it cost-effective?
- Implementation cost
- Expected revenue increaseCautions
⚠️
A/B Testing Cautions
- Peeking: Do not stop early after checking results during the experiment
- Multiple Comparisons: Correction needed when testing multiple metrics simultaneously (Bonferroni)
- Exposure Bias: Check for imbalanced characteristics between groups
- External Factors: Consider seasonal effects, promotions, etc.
Quiz
Problem
Given the following A/B test results, should you apply the new design (B)?
- Group A: 5000 people, 150 conversions
- Group B: 5000 people, 175 conversions
View Answer
from statsmodels.stats.proportion import proportions_ztest
conversions = [150, 175]
n = [5000, 5000]
z_stat, p_value = proportions_ztest(conversions, n)
rate_a = 150/5000
rate_b = 175/5000
lift = (rate_b - rate_a) / rate_a * 100
print(f"A conversion rate: {rate_a:.2%}")
print(f"B conversion rate: {rate_b:.2%}")
print(f"Relative increase: {lift:.1f}%")
print(f"p-value: {p_value:.4f}")
if p_value < 0.05:
print("\nConclusion: Apply B!")
print(f"- Conversion rate increased by {lift:.1f}%")
print(f"- Statistically significant")
else:
print("\nConclusion: Additional experimentation needed")실행 결과
A conversion rate: 3.00% B conversion rate: 3.50% Relative increase: 16.7% p-value: 0.1586 Conclusion: Additional experimentation needed
Summary
A/B Testing Checklist
- Clearly define hypothesis
- Calculate required sample size
- Verify random assignment
- Run experiment for sufficient duration
- Choose appropriate statistical test
- Consider effect size and business impact together
Next Steps
You’ve completed the statistical analysis section! Learn machine learning techniques in the ML Basics section.
Last updated on