Machine Learning

Learn frequently used machine learning techniques in practice. Focus on solving business problems such as customer segmentation, churn prediction, and sales forecasting.

Machine Learning vs Statistics

Perspective	Statistics	Machine Learning
Purpose	Inference, hypothesis testing	Prediction, pattern discovery
Interpretation	Model interpretation focused	Prediction performance focused
Data	Valid even with small samples	Requires large-scale data
Approach	Assumption-based	Data-driven

Curriculum

1. Clustering

Intermediate

Perform customer segmentation using unsupervised learning.

K-Means Clustering
Determining optimal cluster count (Elbow, Silhouette)
RFM-based customer segmentation
Cluster profiling
DBSCAN (Density-based clustering)

Get Started with Clustering →

2. Classification Models

IntermediateAdvanced

Solve classification problems such as customer churn prediction and purchase prediction.

Logistic Regression
Decision Tree
Random Forest
XGBoost
Model Evaluation: Accuracy, Precision, Recall, F1, AUC-ROC

Get Started with Classification Models →

3. Regression Prediction

IntermediateAdvanced

Predict continuous values such as Customer Lifetime Value (CLV) and sales.

Linear Regression
Ridge, Lasso Regression
Random Forest Regression
XGBoost Regression
Model Evaluation: MAE, RMSE, R²

Get Started with Regression Prediction →

4. Time Series Forecasting

Advanced

Forecast time series data such as sales and demand.

Basic Prophet usage
Trend and seasonality modeling
Holiday effect incorporation
Anomaly detection
Multi-time series forecasting

Get Started with Time Series Forecasting →

5. Recommendation Systems

Advanced

Implement product recommendation algorithms.

Collaborative Filtering
Content-Based Filtering
Hybrid Recommendation
Evaluation Metrics: Precision@K, Recall@K, NDCG

Get Started with Recommendation Systems →

ML Workflow


1. Problem Definition
   └─ Transform business goals into ML problems
       ↓
2. Data Collection and Exploration (EDA)
   └─ Understand data, check quality
       ↓
3. Feature Engineering
   └─ Create new features, transformations
       ↓
4. Model Training
   └─ Train/validation data split
   └─ Compare multiple models
       ↓
5. Model Evaluation
   └─ Measure performance on test data
       ↓
6. Deployment and Monitoring
   └─ Apply to production environment
   └─ Performance monitoring, retraining

Key Libraries


# Data preprocessing
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.model_selection import train_test_split, cross_val_score
 
# Models
from sklearn.cluster import KMeans
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from xgboost import XGBClassifier
 
# Evaluation
from sklearn.metrics import (
    accuracy_score, precision_score, recall_score, f1_score,
    confusion_matrix, classification_report, roc_auc_score
)
 
# Time series
from prophet import Prophet

실행 결과

Error: No module named 'prophet'

Model Selection Guide

ℹ️

Which model should you choose?

When data is small (< 1,000 samples)

Logistic Regression, Decision Tree
Watch out for overfitting, cross-validation is essential

When data is large (> 10,000 samples)

Random Forest, XGBoost
Hyperparameter tuning is important

When interpretation is important

Logistic Regression, Decision Tree
Feature importance analysis

When prediction performance is important

XGBoost, LightGBM
Ensemble techniques

Practical Tips

⚠️

Avoiding common mistakes

Data Leakage: Be careful not to include future information in training
Class Imbalance: Few churners in churn prediction → SMOTE, weight adjustment
Overfitting: Good training performance but poor actual performance → Cross-validation is essential
Feature Scaling: Models other than tree-based need normalization/standardization