01. CS Ticket Data Exploration

Beginner2 hours

1. Overview and Scenario

Situation: You are a junior data analyst at an e-commerce company. The CS (Customer Service) team manager comes to you and says:

“It seems like customer inquiries have increased significantly lately. Can you help me understand what types of inquiries are coming in, when, and how many?”

Over 5,000 tickets are pouring in daily. Excel has its limits.

We will use BigQuery (SQL) and Python (Pandas) to thoroughly analyze this massive dataset.

2. Environment Setup and Data Loading

Let’s import the libraries and load the data to start our analysis.

BigQuery (SQL)

BigQuery Connection Setup

In practice, service account keys (.json) are used for security. (Based on Colab or local Jupyter environment.)


import os
from google.cloud import bigquery
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
 
# 1. Set authentication file (modify to your key file path)
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "/path/to/your-service-account-key.json"
 
# 2. Create client object
client = bigquery.Client()
 
print("✅ BigQuery connection complete!")

실행 결과

✅ BigQuery connection complete!

3. Understanding the Data Structure

The first thing to do when you receive data is to see “What does it look like?”

❓ Problem 1: View Data Sample

Q. Query the 5 most recently created tickets from the cs_tickets_dummy table.

BigQuery (SQL)

💡

Hint: Use ORDER BY and LIMIT. The date column is opened_at.

View Solution (Click)


SELECT *
FROM `your-project-id.retail_analytics_us.cs_tickets_dummy`
ORDER BY opened_at DESC
LIMIT 5;

Key Column Descriptions

When you query the data, you’ll see the following columns:

Column	Description	Example
`ticket_id`	Unique ticket number	TKT_12345
`issue_type`	Inquiry type (shipping, refund, etc.)	shipping, refund
`status`	Processing status	open, solved
`priority`	Urgency level	urgent, normal
`opened_at`	Inquiry creation time	2024-01-01 10:00:00

4. Data Quality Check

There’s a saying “Garbage In, Garbage Out”. We need to verify the data is clean before analysis.

❓ Problem 2: Check for Missing Values (NULL)

Q. What percentage of tickets have empty (NULL) first_response_at (first response time)?

BigQuery (SQL)

💡

Hint: Use COUNTIF(column IS NULL) or subtract COUNT(column) from COUNT(*).

View Solution (Click)


SELECT
    COUNT(*) as total_count,
    COUNTIF(first_response_at IS NULL) as null_count,
    ROUND(COUNTIF(first_response_at IS NULL) * 100.0 / COUNT(*), 1) as null_pct
FROM `your-project-id.retail_analytics_us.cs_tickets_dummy`;

⚠️

Interpretation Note: A NULL first_response_at isn’t necessarily a data error - it may mean “the agent hasn’t responded yet”! Don’t delete it blindly.

5. Basic Status Analysis (EDA)

Now let’s finally understand the current status that the team manager was curious about.

❓ Problem 3: Ticket Distribution by Status

Q. Calculate how many tickets exist for each processing status (status) and what percentage of the total they represent.

BigQuery (SQL)

💡

Hint: Use GROUP BY status, and for the ratio, use window function SUM(COUNT(*)) OVER().

View Solution (Click)


SELECT
    status,
    COUNT(*) as ticket_count,
    ROUND(COUNT(*) * 100.0 / SUM(COUNT(*)) OVER(), 1) as percentage
FROM `your-project-id.retail_analytics_us.cs_tickets_dummy`
GROUP BY status
ORDER BY ticket_count DESC;

📊 Visualization: Drawing a Pie Chart

Numbers alone don’t give you a good feel. Let’s draw a pie chart. (Common Python code)


# Prepare distribution data
status_counts = tickets['status'].value_counts()
 
# Draw pie chart
plt.figure(figsize=(8, 8))
plt.pie(status_counts, labels=status_counts.index, autopct='%1.1f%%', startangle=90)
plt.title('Ticket Distribution by Status')
plt.show()

실행 결과

[Graph Displayed]

6. Advanced Analysis: Understanding Issue Types

❓ Problem 4: What is the Most Common Complaint Type?

Q. Aggregate ticket counts by issue_type and find the #1 most frequently submitted complaint type.

BigQuery (SQL)

View Solution (Click)


SELECT
    issue_type,
    COUNT(*) as count
FROM `your-project-id.retail_analytics_us.cs_tickets_dummy`
GROUP BY issue_type
ORDER BY count DESC
LIMIT 1;

💡 Summary and Insights

After exploring the data, we discovered the following facts:

Data Scale: About 5,000 tickets generated over approximately 90 days
No Response Status: About 30% of tickets haven’t received a first response yet (NULL analysis)
Main Issues: shipping and quality problems account for 50% of the total

Next Steps: We’ve understood the current status. So, “How satisfied are customers with our service?” In the next chapter, we’ll calculate CSAT (Customer Satisfaction) and NPS metrics directly.

01. CS Ticket Data Exploration

1. Overview and Scenario

2. Environment Setup and Data Loading

BigQuery (SQL)

Pandas (CSV)

3. Understanding the Data Structure

❓ Problem 1: View Data Sample

BigQuery (SQL)

View Solution (Click)

Pandas

View Solution (Click)

Key Column Descriptions

4. Data Quality Check

❓ Problem 2: Check for Missing Values (NULL)

BigQuery (SQL)

View Solution (Click)

Pandas

View Solution (Click)

5. Basic Status Analysis (EDA)

❓ Problem 3: Ticket Distribution by Status

BigQuery (SQL)

View Solution (Click)

Pandas

View Solution (Click)

📊 Visualization: Drawing a Pie Chart

6. Advanced Analysis: Understanding Issue Types

❓ Problem 4: What is the Most Common Complaint Type?

BigQuery (SQL)

View Solution (Click)

Pandas

View Solution (Click)

💡 Summary and Insights