Skip to Content
ConceptsStatisticsOverview

Statistical Analysis

A collection of statistical analysis recipes for data-driven decision making. Learn statistical techniques frequently used in practice, from descriptive statistics to A/B testing.

Why Do We Need Statistics?

ℹ️
Data Analysis vs Statistical Analysis
  • Data Analysis: “Sales increased by 10% last month”
  • Statistical Analysis: “Verifying whether this increase is by chance or a meaningful change”

Statistics is a tool for quantifying uncertainty in data and drawing reliable conclusions.

0. Setup

import pandas as pd import numpy as np import scipy.stats as stats import statsmodels.api as sm from statsmodels.formula.api import ols # Dummy Data for Examples group_a = np.random.normal(100, 10, 100) group_b = np.random.normal(105, 12, 100) df = pd.DataFrame({'x1': np.random.rand(100), 'x2': np.random.rand(100), 'y': np.random.rand(100)})

Curriculum

1. Descriptive Statistics

Beginner

Learn basic statistics that summarize data characteristics.

  • Central Tendency: Mean, Median, Mode
  • Dispersion: Standard Deviation, Variance, Range, IQR
  • Distribution Shape: Skewness, Kurtosis
  • Percentiles and Quartiles

Start Descriptive Statistics →


2. Correlation Analysis

BeginnerIntermediate

Learn methods to analyze relationships between two variables.

  • Pearson Correlation Coefficient (Continuous Variables)
  • Spearman Correlation Coefficient (Ordinal/Non-linear)
  • Correlation vs Causation
  • Correlation Matrix and Heatmap

Start Correlation Analysis →


3. Hypothesis Testing

Intermediate

Learn data-based hypothesis verification methods.

  • Null Hypothesis and Alternative Hypothesis
  • Meaning and Interpretation of p-value
  • t-test (One-sample, Independent, Paired)
  • Chi-square Test (Categorical Variables)
  • Type I/Type II Errors

Start Hypothesis Testing →


4. Regression Analysis

IntermediateAdvanced

Learn methods to model and predict relationships between variables.

  • Simple Linear Regression
  • Multiple Linear Regression
  • Interpreting Regression Coefficients
  • Coefficient of Determination (R²) and Model Evaluation
  • Multicollinearity Diagnosis

Start Regression Analysis →


5. A/B Testing

IntermediateAdvanced

Learn causal relationship verification through experiments.

  • A/B Test Design
  • Sample Size Calculation (Power Analysis)
  • Conversion Rate Comparison (Proportion Test)
  • Continuous Metric Comparison (t-test)
  • Early Stopping and Multiple Comparison Issues

Start A/B Testing →


6. Time Series Analysis

Advanced

Learn methods to analyze and forecast data patterns over time.

  • Time Series Decomposition (Trend, Seasonality, Residual)
  • Moving Average and Exponential Smoothing
  • Autocorrelation (ACF) and Partial Autocorrelation (PACF)
  • ARIMA Model Basics
  • Using Prophet

Start Time Series Analysis →

Key Concepts Summary

Probability and Distributions

DistributionWhen to UseExamples
Normal DistributionContinuous data, Sample meansHeight, Weight, Test scores
Binomial DistributionSuccess/Failure countsConversions, Clicks
Poisson DistributionOccurrences per unit timeDaily orders
t-DistributionSmall sample mean comparisonTesting mean differences between groups

Hypothesis Testing Decision Flow

Data Collection Hypothesis Setting (H₀, H₁) Significance Level (usually α = 0.05) Calculate Test Statistic Calculate p-value p < α → Reject H₀ (Significant difference) p ≥ α → Fail to reject H₀ (No significant difference)

Python Statistics Libraries

# Basic Statistics import scipy.stats as stats import statsmodels.api as sm from statsmodels.formula.api import ols # Example: t-test t_stat, p_value = stats.ttest_ind(group_a, group_b) # Example: Regression Analysis model = ols('y ~ x1 + x2', data=df).fit() print(model.summary())
실행 결과
OLS Regression Results
==============================================================================
Dep. Variable:                      y   R-squared:                       0.005
Model:                            OLS   Adj. R-squared:                 -0.015
Method:                 Least Squares   F-statistic:                    0.2639
Date:                Sat, 20 Dec 2025   Prob (F-statistic):              0.769
Time:                        00:25:05   Log-Likelihood:                -13.739
No. Observations:                 100   AIC:                             33.48
Df Residuals:                      97   BIC:                             41.29
Df Model:                           2
Covariance Type:            nonrobust
==============================================================================
               coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      0.5603      0.078      7.139      0.000       0.404       0.716
x1            -0.0050      0.097     -0.052      0.959      -0.197       0.187
x2            -0.0750      0.103     -0.726      0.469      -0.280       0.130
==============================================================================
Omnibus:                       19.776   Durbin-Watson:                   2.230
Prob(Omnibus):                  0.000   Jarque-Bera (JB):                5.138
Skew:                          -0.135   Prob(JB):                       0.0766
Kurtosis:                       1.923   Cond. No.                         5.63
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

Practical Tips

💡
Statistical Significance vs Practical Importance
  • p < 0.05 doesn’t automatically mean it’s meaningful
  • Consider Effect Size together
  • Example: If conversion rate increased by 0.01% and p = 0.001?
    • Statistically significant, but practical value may be small
  • Always calculate business impact (revenue effect, etc.) together
Last updated on

🤖AI 모의면접실전처럼 연습하기