02. AI-Based Topic Classification (LLM & Gemini)
1. Overview and Scenario
Situation: Thousands of tickets are classified as “Others”. There are too many for agents to read and classify individually.
“Can’t AI read and classify these for us?”
In the past, tens of thousands of training data were needed, but now you can just tell an LLM (Large Language Model) to “read this and classify it”. Let’s call Google Gemini directly from within BigQuery.
2. BigQuery ML + Gemini Integration
In BigQuery, you can call the Gemini model (gemini-pro) using the ML.GENERATE_TEXT function.
(Note: Separate Google Cloud Vertex AI connection setup may be required.)
❓ Problem 1: Create Gemini Model (DDL)
Q. Write a query to create a remote model (remote_model) based on gemini-pro in BigQuery.
(In the practice environment, we’ll assume it’s already created and proceed.)
BigQuery (SQL)
Hint: Use CREATE MODEL statement and specify REMOTE_SERVICE_TYPE.
View Solution
-- Model creation example (requires admin permissions)
CREATE OR REPLACE MODEL `your-project-id.retail_analytics_us.gemini_pro_model`
REMOTE WITH CONNECTION `projects/your-project-id/locations/us/connections/vertex-ai`
OPTIONS(endpoint = 'gemini-pro');3. Prompt Engineering
To get good results from AI, you need to ask good questions (Prompts).
❓ Problem 2: Write Ticket Classification Prompt
Q. Write a prompt to classify tickets into one of the following categories and execute with ML.GENERATE_TEXT.
- Categories:
Product Quality,Shipping,Payment,General Inquiry
BigQuery (SQL)
In BigQuery, assemble the prompt using CONCAT.
View Solution
SELECT
ticket_id,
issue_type as original_label,
ml_generate_text_result['candidates'][0]['content']['parts'][0]['text'] AS ai_classification
FROM
ML.GENERATE_TEXT(
MODEL `your-project-id.retail_analytics_us.gemini_pro_model`,
(
SELECT
ticket_id,
issue_type,
CONCAT(
'You are a customer service manager. Classify the following ticket into one of these categories: ',
'[Product Quality, Shipping, Payment, General Inquiry]. ',
'Do not explain, just return the category name. ',
'Ticket comments: ', comment
) AS prompt
FROM `your-project-id.retail_analytics_us.cs_tickets_dummy`
WHERE status = 'open'
LIMIT 5
),
STRUCT(
0.0 AS temperature, -- Set to 0 for consistent answers
10 AS max_output_tokens
)
);4. Result Verification and Summary Generation
Beyond simple classification, you can also generate “3-line summaries” or “draft responses”.
❓ Problem 3: Ticket Summary and Response Recommendation
Q. Execute a prompt that says “Summarize the ticket content in 3 lines and write a draft response for the agent to send”.
BigQuery (SQL)
View Solution
SELECT
ticket_id,
ml_generate_text_result['candidates'][0]['content']['parts'][0]['text'] AS ai_response
FROM
ML.GENERATE_TEXT(
MODEL `your-project-id.retail_analytics_us.gemini_pro_model`,
(
SELECT
ticket_id,
CONCAT(
'Summarize this ticket in 1 sentence and suggest a polite response draft. ',
'Ticket: ', comment
) AS prompt
FROM `your-project-id.retail_analytics_us.cs_tickets_dummy`
LIMIT 3
),
STRUCT(0.5 AS temperature, 200 AS max_output_tokens)
);💡 Summary
- BigQuery ML: Apply the latest LLM to large-scale data using SQL alone.
- Prompt Engineering: Giving AI a role (Role), constraints (Constraints), and examples (Few-shot) improves performance.
- Applications: Auto-classification, summarization, sentiment analysis, translation, and countless more.
In the next chapter, we’ll use traditional Machine Learning (Classification) to predict satisfaction faster and cheaper than LLMs.