04. Churn Prediction
1. What is Churn?
When a customer stops using a service, it’s called Churn. The cost of acquiring a new customer is 5-25 times more expensive than retaining an existing customer. Therefore, it’s important to predict in advance which customers are likely to churn and retain them with benefits.
2. Data Preparation and Preprocessing
Churn prediction is a Classification problem. We predict churn status (1: churned, 0: retained).
❓ Problem 1: Categorical Variable Transformation
Q. Convert the ‘Priority’ and ‘Channel’ columns into numbers that machine learning models can understand. (Use Label Encoding, etc.)
Theory Reference: Classification & NLP Concepts
Python (Scikit-Learn)
from sklearn.preprocessing import LabelEncoder
import pandas as pd
# Load data (synthetic data)
df = pd.DataFrame({
'user_id': [1, 2, 3, 4],
'priority': ['High', 'Low', 'Medium', 'High'],
'channel': ['Email', 'Chat', 'Email', 'Phone'],
'churned': [1, 0, 0, 1]
})
# Label Encoding
le = LabelEncoder()
df['priority_encoded'] = le.fit_transform(df['priority'])
# Check results
print(df[['priority', 'priority_encoded']])priority priority_encoded 0 High 0 1 Low 1 2 Medium 2 3 High 0
3. Model Training (Logistic Regression)
Logistic regression is useful for marketing because it provides churn probability (0~1).
❓ Problem 2: Model Training and Identifying Important Variables
Q. Train a logistic regression model and identify which variable has the greatest impact on churn (Coefficient).
Python (Scikit-Learn)
from sklearn.linear_model import LogisticRegression
# Separate X (Features) and y (Target)
X = df[['priority_encoded']] # Using only 1 variable for example
y = df['churned']
# Train model
model = LogisticRegression()
model.fit(X, y)
# Check regression coefficient
# Positive (+) means it increases churn probability, negative (-) means it decreases it.
print(f"Priority Coefficient: {model.coef_[0][0]:.4f}")Priority Coefficient: -0.9156
4. Evaluation
Recall may be more important than simple accuracy. This is because finding customers who will actually churn is the core of churn prevention marketing.
Theory Reference: Model Evaluation Metrics