Skip to Content

A/B Testing

IntermediateAdvanced

Learning Objectives

  • A/B Test Design
  • Sample Size Calculation
  • Conversion Rate/Mean Comparison Tests
  • Results Interpretation

0. Setup

Load CSV files for data practice.

import pandas as pd import numpy as np import seaborn as sns import matplotlib.pyplot as plt from scipy import stats # Load Data orders = pd.read_csv('src_orders.csv', parse_dates=['created_at']) items = pd.read_csv('src_order_items.csv') products = pd.read_csv('src_products.csv') users = pd.read_csv('src_users.csv') # Merge for Analysis df = orders.merge(items, on='order_id').merge(products, on='product_id').merge(users, on='user_id') # Simulate AB Test Data for Examples np.random.seed(42) df['experiment_group'] = np.random.choice(['control', 'treatment'], size=len(df)) # Add slight effect to treatment df.loc[df['experiment_group'] == 'treatment', 'sale_price'] *= 1.05 df['order_amount'] = df['sale_price'] # Alias for example

1. What is A/B Testing?

Definition

A/B testing is an experiment that compares two versions (A: control, B: treatment) to verify which one is more effective.

Use Cases

  • Website button colors
  • Email subject lines
  • Pricing policies
  • Recommendation algorithms

2. Sample Size Calculation

Power Analysis

Calculate the required sample size before the experiment.

from statsmodels.stats.power import TTestIndPower # Parameters effect_size = 0.2 # Effect size (small: 0.2, medium: 0.5, large: 0.8) alpha = 0.05 # Significance level power = 0.8 # Statistical power # Sample size calculation analysis = TTestIndPower() sample_size = analysis.solve_power( effect_size=effect_size, power=power, alpha=alpha, alternative='two-sided' ) print(f"Required sample size (per group): {int(sample_size)}") print(f"Total required sample: {int(sample_size * 2)}")
실행 결과
Required sample size (per group): 393
Total required sample: 786

3. Conversion Rate Comparison (Proportion Test)

Z-test

from statsmodels.stats.proportion import proportions_ztest # Data # Group A: 50 conversions out of 1000 # Group B: 65 conversions out of 1000 conversions = [50, 65] n_observations = [1000, 1000] # Z-test z_stat, p_value = proportions_ztest(conversions, n_observations, alternative='two-sided') # Conversion rates rate_a = conversions[0] / n_observations[0] rate_b = conversions[1] / n_observations[1] lift = (rate_b - rate_a) / rate_a * 100 print(f"Group A conversion rate: {rate_a:.2%}") print(f"Group B conversion rate: {rate_b:.2%}") print(f"Relative increase: {lift:.1f}%") print(f"p-value: {p_value:.4f}") if p_value < 0.05: print("→ Group B is significantly better!") else: print("→ No significant difference")
실행 결과
Group A conversion rate: 5.00%
Group B conversion rate: 6.50%
Relative increase: 30.0%
p-value: 0.1496
→ No significant difference

4. Mean Comparison (t-test)

Order Amount Comparison

from scipy import stats # Group data group_a = df[df['experiment_group'] == 'control']['order_amount'] group_b = df[df['experiment_group'] == 'treatment']['order_amount'] # t-test t_stat, p_value = stats.ttest_ind(group_a, group_b) print(f"Group A mean: ${group_a.mean():.2f}") print(f"Group B mean: ${group_b.mean():.2f}") print(f"Difference: ${group_b.mean() - group_a.mean():.2f}") print(f"p-value: {p_value:.4f}") # Effect size (Cohen's d) pooled_std = np.sqrt((group_a.std()**2 + group_b.std()**2) / 2) cohens_d = (group_b.mean() - group_a.mean()) / pooled_std print(f"Cohen's d: {cohens_d:.3f}")
실행 결과
Group A mean: $59.49
Group B mean: $62.97
Difference: $3.48
p-value: 0.0000
Cohen's d: 0.050

5. Results Interpretation

Decision Framework

1. Is p < 0.05? - No → No significant difference, additional experimentation needed - Yes → Proceed to next step 2. Is the effect size practically meaningful? - 0.1% increase vs 10% increase in conversion rate - Calculate business impact 3. Is it cost-effective? - Implementation cost - Expected revenue increase

Cautions

⚠️
A/B Testing Cautions
  • Peeking: Do not stop early after checking results during the experiment
  • Multiple Comparisons: Correction needed when testing multiple metrics simultaneously (Bonferroni)
  • Exposure Bias: Check for imbalanced characteristics between groups
  • External Factors: Consider seasonal effects, promotions, etc.

Quiz

Problem

Given the following A/B test results, should you apply the new design (B)?

  • Group A: 5000 people, 150 conversions
  • Group B: 5000 people, 175 conversions

View Answer

from statsmodels.stats.proportion import proportions_ztest conversions = [150, 175] n = [5000, 5000] z_stat, p_value = proportions_ztest(conversions, n) rate_a = 150/5000 rate_b = 175/5000 lift = (rate_b - rate_a) / rate_a * 100 print(f"A conversion rate: {rate_a:.2%}") print(f"B conversion rate: {rate_b:.2%}") print(f"Relative increase: {lift:.1f}%") print(f"p-value: {p_value:.4f}") if p_value < 0.05: print("\nConclusion: Apply B!") print(f"- Conversion rate increased by {lift:.1f}%") print(f"- Statistically significant") else: print("\nConclusion: Additional experimentation needed")
실행 결과
A conversion rate: 3.00%
B conversion rate: 3.50%
Relative increase: 16.7%
p-value: 0.1586

Conclusion: Additional experimentation needed

Summary

A/B Testing Checklist

  1. Clearly define hypothesis
  2. Calculate required sample size
  3. Verify random assignment
  4. Run experiment for sufficient duration
  5. Choose appropriate statistical test
  6. Consider effect size and business impact together

Next Steps

You’ve completed the statistical analysis section! Learn machine learning techniques in the ML Basics section.

Last updated on

🤖AI 모의면접실전처럼 연습하기