Skip to Content

01. CS Ticket Data Exploration

Beginner2 hours

1. Overview and Scenario

Situation: You are a junior data analyst at an e-commerce company. The CS (Customer Service) team manager comes to you and says:

“It seems like customer inquiries have increased significantly lately. Can you help me understand what types of inquiries are coming in, when, and how many?”

Over 5,000 tickets are pouring in daily. Excel has its limits.

We will use BigQuery (SQL) and Python (Pandas) to thoroughly analyze this massive dataset.


2. Environment Setup and Data Loading

Let’s import the libraries and load the data to start our analysis.

BigQuery Connection Setup

In practice, service account keys (.json) are used for security. (Based on Colab or local Jupyter environment.)

import os from google.cloud import bigquery import pandas as pd import matplotlib.pyplot as plt import seaborn as sns # 1. Set authentication file (modify to your key file path) os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "/path/to/your-service-account-key.json" # 2. Create client object client = bigquery.Client() print("✅ BigQuery connection complete!")
실행 결과
✅ BigQuery connection complete!

3. Understanding the Data Structure

The first thing to do when you receive data is to see “What does it look like?”

❓ Problem 1: View Data Sample

Q. Query the 5 most recently created tickets from the cs_tickets_dummy table.

💡

Hint: Use ORDER BY and LIMIT. The date column is opened_at.

View Solution (Click)

SELECT * FROM `your-project-id.retail_analytics_us.cs_tickets_dummy` ORDER BY opened_at DESC LIMIT 5;

Key Column Descriptions

When you query the data, you’ll see the following columns:

ColumnDescriptionExample
ticket_idUnique ticket numberTKT_12345
issue_typeInquiry type (shipping, refund, etc.)shipping, refund
statusProcessing statusopen, solved
priorityUrgency levelurgent, normal
opened_atInquiry creation time2024-01-01 10:00:00

4. Data Quality Check

There’s a saying “Garbage In, Garbage Out”. We need to verify the data is clean before analysis.

❓ Problem 2: Check for Missing Values (NULL)

Q. What percentage of tickets have empty (NULL) first_response_at (first response time)?

💡

Hint: Use COUNTIF(column IS NULL) or subtract COUNT(column) from COUNT(*).

View Solution (Click)

SELECT COUNT(*) as total_count, COUNTIF(first_response_at IS NULL) as null_count, ROUND(COUNTIF(first_response_at IS NULL) * 100.0 / COUNT(*), 1) as null_pct FROM `your-project-id.retail_analytics_us.cs_tickets_dummy`;
⚠️

Interpretation Note: A NULL first_response_at isn’t necessarily a data error - it may mean “the agent hasn’t responded yet”! Don’t delete it blindly.


5. Basic Status Analysis (EDA)

Now let’s finally understand the current status that the team manager was curious about.

❓ Problem 3: Ticket Distribution by Status

Q. Calculate how many tickets exist for each processing status (status) and what percentage of the total they represent.

💡

Hint: Use GROUP BY status, and for the ratio, use window function SUM(COUNT(*)) OVER().

View Solution (Click)

SELECT status, COUNT(*) as ticket_count, ROUND(COUNT(*) * 100.0 / SUM(COUNT(*)) OVER(), 1) as percentage FROM `your-project-id.retail_analytics_us.cs_tickets_dummy` GROUP BY status ORDER BY ticket_count DESC;

📊 Visualization: Drawing a Pie Chart

Numbers alone don’t give you a good feel. Let’s draw a pie chart. (Common Python code)

# Prepare distribution data status_counts = tickets['status'].value_counts() # Draw pie chart plt.figure(figsize=(8, 8)) plt.pie(status_counts, labels=status_counts.index, autopct='%1.1f%%', startangle=90) plt.title('Ticket Distribution by Status') plt.show()
실행 결과
[Graph Displayed]

6. Advanced Analysis: Understanding Issue Types

❓ Problem 4: What is the Most Common Complaint Type?

Q. Aggregate ticket counts by issue_type and find the #1 most frequently submitted complaint type.

View Solution (Click)

SELECT issue_type, COUNT(*) as count FROM `your-project-id.retail_analytics_us.cs_tickets_dummy` GROUP BY issue_type ORDER BY count DESC LIMIT 1;

💡 Summary and Insights

After exploring the data, we discovered the following facts:

  1. Data Scale: About 5,000 tickets generated over approximately 90 days
  2. No Response Status: About 30% of tickets haven’t received a first response yet (NULL analysis)
  3. Main Issues: shipping and quality problems account for 50% of the total

Next Steps: We’ve understood the current status. So, “How satisfied are customers with our service?” In the next chapter, we’ll calculate CSAT (Customer Satisfaction) and NPS metrics directly.

Last updated on

🤖AI 모의면접실전처럼 연습하기