01. CS Ticket Data Exploration
1. Overview and Scenario
Situation: You are a junior data analyst at an e-commerce company. The CS (Customer Service) team manager comes to you and says:
“It seems like customer inquiries have increased significantly lately. Can you help me understand what types of inquiries are coming in, when, and how many?”
Over 5,000 tickets are pouring in daily. Excel has its limits.
We will use BigQuery (SQL) and Python (Pandas) to thoroughly analyze this massive dataset.
2. Environment Setup and Data Loading
Let’s import the libraries and load the data to start our analysis.
BigQuery (SQL)
BigQuery Connection Setup
In practice, service account keys (.json) are used for security.
(Based on Colab or local Jupyter environment.)
import os
from google.cloud import bigquery
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# 1. Set authentication file (modify to your key file path)
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "/path/to/your-service-account-key.json"
# 2. Create client object
client = bigquery.Client()
print("✅ BigQuery connection complete!")✅ BigQuery connection complete!
3. Understanding the Data Structure
The first thing to do when you receive data is to see “What does it look like?”
❓ Problem 1: View Data Sample
Q. Query the 5 most recently created tickets from the cs_tickets_dummy table.
BigQuery (SQL)
Hint: Use ORDER BY and LIMIT. The date column is opened_at.
View Solution (Click)
SELECT *
FROM `your-project-id.retail_analytics_us.cs_tickets_dummy`
ORDER BY opened_at DESC
LIMIT 5;Key Column Descriptions
When you query the data, you’ll see the following columns:
| Column | Description | Example |
|---|---|---|
ticket_id | Unique ticket number | TKT_12345 |
issue_type | Inquiry type (shipping, refund, etc.) | shipping, refund |
status | Processing status | open, solved |
priority | Urgency level | urgent, normal |
opened_at | Inquiry creation time | 2024-01-01 10:00:00 |
4. Data Quality Check
There’s a saying “Garbage In, Garbage Out”. We need to verify the data is clean before analysis.
❓ Problem 2: Check for Missing Values (NULL)
Q. What percentage of tickets have empty (NULL) first_response_at (first response time)?
BigQuery (SQL)
Hint: Use COUNTIF(column IS NULL) or subtract COUNT(column) from COUNT(*).
View Solution (Click)
SELECT
COUNT(*) as total_count,
COUNTIF(first_response_at IS NULL) as null_count,
ROUND(COUNTIF(first_response_at IS NULL) * 100.0 / COUNT(*), 1) as null_pct
FROM `your-project-id.retail_analytics_us.cs_tickets_dummy`;Interpretation Note: A NULL first_response_at isn’t necessarily a data error - it may mean “the agent hasn’t responded yet”! Don’t delete it blindly.
5. Basic Status Analysis (EDA)
Now let’s finally understand the current status that the team manager was curious about.
❓ Problem 3: Ticket Distribution by Status
Q. Calculate how many tickets exist for each processing status (status) and what percentage of the total they represent.
BigQuery (SQL)
Hint: Use GROUP BY status, and for the ratio, use window function SUM(COUNT(*)) OVER().
View Solution (Click)
SELECT
status,
COUNT(*) as ticket_count,
ROUND(COUNT(*) * 100.0 / SUM(COUNT(*)) OVER(), 1) as percentage
FROM `your-project-id.retail_analytics_us.cs_tickets_dummy`
GROUP BY status
ORDER BY ticket_count DESC;📊 Visualization: Drawing a Pie Chart
Numbers alone don’t give you a good feel. Let’s draw a pie chart. (Common Python code)
# Prepare distribution data
status_counts = tickets['status'].value_counts()
# Draw pie chart
plt.figure(figsize=(8, 8))
plt.pie(status_counts, labels=status_counts.index, autopct='%1.1f%%', startangle=90)
plt.title('Ticket Distribution by Status')
plt.show()[Graph Displayed]
6. Advanced Analysis: Understanding Issue Types
❓ Problem 4: What is the Most Common Complaint Type?
Q. Aggregate ticket counts by issue_type and find the #1 most frequently submitted complaint type.
BigQuery (SQL)
View Solution (Click)
SELECT
issue_type,
COUNT(*) as count
FROM `your-project-id.retail_analytics_us.cs_tickets_dummy`
GROUP BY issue_type
ORDER BY count DESC
LIMIT 1;💡 Summary and Insights
After exploring the data, we discovered the following facts:
- Data Scale: About 5,000 tickets generated over approximately 90 days
- No Response Status: About 30% of tickets haven’t received a first response yet (
NULLanalysis) - Main Issues:
shippingandqualityproblems account for 50% of the total
Next Steps: We’ve understood the current status. So, “How satisfied are customers with our service?” In the next chapter, we’ll calculate CSAT (Customer Satisfaction) and NPS metrics directly.