01. Text Sentiment Analysis

Advanced2 hours

1. Overview and Scenario

Situation: Thousands of reviews pour in every day. If the rating is 5 stars but the review says “I’m angry because shipping was so late, but the product is good”, should we consider this positive? Let’s quantify the customer sentiment hidden in text for analysis.

2. Data Preparation

We’ll analyze review text (review_body) from the raw_reviews_relabeled table.

BigQuery (SQL)


from google.cloud import bigquery
import pandas as pd
 
client = bigquery.Client()

3. Keyword-Based Sentiment Analysis

The simplest method is to check for “good words” and “bad words”.

❓ Problem 1: Find Positive/Negative Keywords

Q. If the review text contains bad, poor, terrible, late, etc., classify as ‘Negative’. If it contains good, great, excellent, etc., classify as ‘Positive’. Otherwise, classify as ‘Neutral’.

BigQuery (SQL)

💡

Hint: Using REGEXP_CONTAINS lets you find multiple patterns at once.

View Solution


SELECT
    review_id,
    review_body,
    CASE
        WHEN REGEXP_CONTAINS(LOWER(review_body), r'bad|poor|terrible|late|slow|worst') THEN 'Negative'
        WHEN REGEXP_CONTAINS(LOWER(review_body), r'good|great|excellent|love|best') THEN 'Positive'
        ELSE 'Neutral'
    END as sentiment_category
FROM `your-project-id.retail_analytics_us.raw_reviews_relabeled`
LIMIT 10;

4. Using NLP Libraries (TextBlob)

Simple keyword matching may incorrectly classify “Not good” as “Positive”. Specialized NLP libraries consider context to some degree and provide scores ( $-1.0 \sim +1.0$ ).

❓ Problem 2: Calculate Sentiment Score (Polarity Score)

Q. Use Python’s TextBlob (or BigQuery Remote Function) to calculate the sentiment score of reviews.

Pandas (TextBlob)

Pandas: Use TextBlob(text).sentiment.polarity.

View Solution


from textblob import TextBlob
 
# Sentiment score calculation function
def get_sentiment(text):
    return TextBlob(str(text)).sentiment.polarity
 
reviews['sentiment_score'] = reviews['review_body'].apply(get_sentiment)
 
# Check results
print(reviews[['review_body', 'sentiment_score']].head())
 
# Average score
print(f"Average sentiment score: {reviews['sentiment_score'].mean():.4f}")

5. Sentiment Analysis and Visualization by Category

Let’s find out which product category has the most customer complaints.

❓ Problem 3: Find the Worst Categories

Q. Based on calculated sentiment scores, aggregate average sentiment score and negative review ratio (score < 0) by category and visualize.

Pandas (Visualization)

View Solution


import matplotlib.pyplot as plt
 
# Aggregation
cat_summary = reviews.groupby('relabeled_category').agg({
    'sentiment_score': 'mean',
    'review_id': 'count'
}).reset_index()
 
# Calculate negative review ratio
neg_reviews = reviews[reviews['sentiment_score'] < 0].groupby('relabeled_category').size()
total_reviews = reviews.groupby('relabeled_category').size()
cat_summary['neg_ratio'] = (neg_reviews / total_reviews * 100).fillna(0).values
 
# Visualization (Scatter Plot)
plt.figure(figsize=(10, 6))
plt.scatter(cat_summary['neg_ratio'], cat_summary['sentiment_score'],
            s=cat_summary['review_id']/10, alpha=0.5)
 
for i, txt in enumerate(cat_summary['relabeled_category']):
    plt.annotate(txt, (cat_summary['neg_ratio'][i], cat_summary['sentiment_score'][i]))
 
plt.xlabel('Negative Review Ratio (%)')
plt.ylabel('Average Sentiment Score')
plt.title('Sentiment Analysis by Category')
plt.axvline(x=30, color='r', linestyle='--') # Danger if over 30%
plt.show()

💡 Summary

Keyword Matching: Fast and free but less accurate. Can’t distinguish “Not bad”.
Rule-based NLP (TextBlob): Considers grammar for slightly better accuracy.
LLM (Next Chapter): Highest level of performance that even understands “sarcasm”.

In the next chapter, we’ll use Gemini to understand and classify text like a real human.

01. Text Sentiment Analysis

1. Overview and Scenario

2. Data Preparation

BigQuery (SQL)

Pandas (CSV)

3. Keyword-Based Sentiment Analysis

❓ Problem 1: Find Positive/Negative Keywords

BigQuery (SQL)

View Solution

Pandas

View Solution

4. Using NLP Libraries (TextBlob)

❓ Problem 2: Calculate Sentiment Score (Polarity Score)

Pandas (TextBlob)

View Solution

BigQuery (Remote Function)

5. Sentiment Analysis and Visualization by Category

❓ Problem 3: Find the Worst Categories

Pandas (Visualization)

View Solution

💡 Summary