Skip to Content
ConceptsMachine LearningOverview

Machine Learning

Learn frequently used machine learning techniques in practice. Focus on solving business problems such as customer segmentation, churn prediction, and sales forecasting.

Machine Learning vs Statistics

PerspectiveStatisticsMachine Learning
PurposeInference, hypothesis testingPrediction, pattern discovery
InterpretationModel interpretation focusedPrediction performance focused
DataValid even with small samplesRequires large-scale data
ApproachAssumption-basedData-driven

Curriculum

1. Clustering

Intermediate

Perform customer segmentation using unsupervised learning.

  • K-Means Clustering
  • Determining optimal cluster count (Elbow, Silhouette)
  • RFM-based customer segmentation
  • Cluster profiling
  • DBSCAN (Density-based clustering)

Get Started with Clustering β†’


2. Classification Models

IntermediateAdvanced

Solve classification problems such as customer churn prediction and purchase prediction.

  • Logistic Regression
  • Decision Tree
  • Random Forest
  • XGBoost
  • Model Evaluation: Accuracy, Precision, Recall, F1, AUC-ROC

Get Started with Classification Models β†’


3. Regression Prediction

IntermediateAdvanced

Predict continuous values such as Customer Lifetime Value (CLV) and sales.

  • Linear Regression
  • Ridge, Lasso Regression
  • Random Forest Regression
  • XGBoost Regression
  • Model Evaluation: MAE, RMSE, RΒ²

Get Started with Regression Prediction β†’


4. Time Series Forecasting

Advanced

Forecast time series data such as sales and demand.

  • Basic Prophet usage
  • Trend and seasonality modeling
  • Holiday effect incorporation
  • Anomaly detection
  • Multi-time series forecasting

Get Started with Time Series Forecasting β†’


5. Recommendation Systems

Advanced

Implement product recommendation algorithms.

  • Collaborative Filtering
  • Content-Based Filtering
  • Hybrid Recommendation
  • Evaluation Metrics: Precision@K, Recall@K, NDCG

Get Started with Recommendation Systems β†’

ML Workflow

1. Problem Definition └─ Transform business goals into ML problems ↓ 2. Data Collection and Exploration (EDA) └─ Understand data, check quality ↓ 3. Feature Engineering └─ Create new features, transformations ↓ 4. Model Training └─ Train/validation data split └─ Compare multiple models ↓ 5. Model Evaluation └─ Measure performance on test data ↓ 6. Deployment and Monitoring └─ Apply to production environment └─ Performance monitoring, retraining

Key Libraries

# Data preprocessing from sklearn.preprocessing import StandardScaler, LabelEncoder from sklearn.model_selection import train_test_split, cross_val_score # Models from sklearn.cluster import KMeans from sklearn.linear_model import LogisticRegression from sklearn.ensemble import RandomForestClassifier from xgboost import XGBClassifier # Evaluation from sklearn.metrics import ( accuracy_score, precision_score, recall_score, f1_score, confusion_matrix, classification_report, roc_auc_score ) # Time series from prophet import Prophet
μ‹€ν–‰ κ²°κ³Ό
Error: No module named 'prophet'

Model Selection Guide

ℹ️
Which model should you choose?

When data is small (< 1,000 samples)

  • Logistic Regression, Decision Tree
  • Watch out for overfitting, cross-validation is essential

When data is large (> 10,000 samples)

  • Random Forest, XGBoost
  • Hyperparameter tuning is important

When interpretation is important

  • Logistic Regression, Decision Tree
  • Feature importance analysis

When prediction performance is important

  • XGBoost, LightGBM
  • Ensemble techniques

Practical Tips

⚠️
Avoiding common mistakes
  1. Data Leakage: Be careful not to include future information in training
  2. Class Imbalance: Few churners in churn prediction β†’ SMOTE, weight adjustment
  3. Overfitting: Good training performance but poor actual performance β†’ Cross-validation is essential
  4. Feature Scaling: Models other than tree-based need normalization/standardization
Last updated on

πŸ€–AI λͺ¨μ˜λ©΄μ ‘μ‹€μ „μ²˜λŸΌ μ—°μŠ΅ν•˜κΈ°