Skip to Content
ProjectsProject 2: Marketing Strategy04. Churn Prediction (ML)

04. Churn Prediction

Machine Learning40 min

1. What is Churn?

When a customer stops using a service, it’s called Churn. The cost of acquiring a new customer is 5-25 times more expensive than retaining an existing customer. Therefore, it’s important to predict in advance which customers are likely to churn and retain them with benefits.

2. Data Preparation and Preprocessing

Churn prediction is a Classification problem. We predict churn status (1: churned, 0: retained).

❓ Problem 1: Categorical Variable Transformation

Q. Convert the ‘Priority’ and ‘Channel’ columns into numbers that machine learning models can understand. (Use Label Encoding, etc.)

Theory Reference: Classification & NLP Concepts

from sklearn.preprocessing import LabelEncoder import pandas as pd # Load data (synthetic data) df = pd.DataFrame({ 'user_id': [1, 2, 3, 4], 'priority': ['High', 'Low', 'Medium', 'High'], 'channel': ['Email', 'Chat', 'Email', 'Phone'], 'churned': [1, 0, 0, 1] }) # Label Encoding le = LabelEncoder() df['priority_encoded'] = le.fit_transform(df['priority']) # Check results print(df[['priority', 'priority_encoded']])
실행 결과
priority  priority_encoded
0     High                 0
1      Low                 1
2   Medium                 2
3     High                 0

3. Model Training (Logistic Regression)

Logistic regression is useful for marketing because it provides churn probability (0~1).

❓ Problem 2: Model Training and Identifying Important Variables

Q. Train a logistic regression model and identify which variable has the greatest impact on churn (Coefficient).

from sklearn.linear_model import LogisticRegression # Separate X (Features) and y (Target) X = df[['priority_encoded']] # Using only 1 variable for example y = df['churned'] # Train model model = LogisticRegression() model.fit(X, y) # Check regression coefficient # Positive (+) means it increases churn probability, negative (-) means it decreases it. print(f"Priority Coefficient: {model.coef_[0][0]:.4f}")
실행 결과
Priority Coefficient: -0.9156

4. Evaluation

Recall may be more important than simple accuracy. This is because finding customers who will actually churn is the core of churn prevention marketing.

Theory Reference: Model Evaluation Metrics

Last updated on

🤖AI 모의면접실전처럼 연습하기