Skip to Content

01. Text Sentiment Analysis

Advanced2 hours

1. Overview and Scenario

Situation: Thousands of reviews pour in every day. If the rating is 5 stars but the review says ā€œI’m angry because shipping was so late, but the product is goodā€, should we consider this positive? Let’s quantify the customer sentiment hidden in text for analysis.


2. Data Preparation

We’ll analyze review text (review_body) from the raw_reviews_relabeled table.

from google.cloud import bigquery import pandas as pd client = bigquery.Client()

3. Keyword-Based Sentiment Analysis

The simplest method is to check for ā€œgood wordsā€ and ā€œbad wordsā€.

ā“ Problem 1: Find Positive/Negative Keywords

Q. If the review text contains bad, poor, terrible, late, etc., classify as ā€˜Negative’. If it contains good, great, excellent, etc., classify as ā€˜Positive’. Otherwise, classify as ā€˜Neutral’.

šŸ’”

Hint: Using REGEXP_CONTAINS lets you find multiple patterns at once.

View Solution

SELECT review_id, review_body, CASE WHEN REGEXP_CONTAINS(LOWER(review_body), r'bad|poor|terrible|late|slow|worst') THEN 'Negative' WHEN REGEXP_CONTAINS(LOWER(review_body), r'good|great|excellent|love|best') THEN 'Positive' ELSE 'Neutral' END as sentiment_category FROM `your-project-id.retail_analytics_us.raw_reviews_relabeled` LIMIT 10;

4. Using NLP Libraries (TextBlob)

Simple keyword matching may incorrectly classify ā€œNot goodā€ as ā€œPositiveā€. Specialized NLP libraries consider context to some degree and provide scores (āˆ’1.0∼+1.0-1.0 \sim +1.0).

ā“ Problem 2: Calculate Sentiment Score (Polarity Score)

Q. Use Python’s TextBlob (or BigQuery Remote Function) to calculate the sentiment score of reviews.

Pandas: Use TextBlob(text).sentiment.polarity.

View Solution

from textblob import TextBlob # Sentiment score calculation function def get_sentiment(text): return TextBlob(str(text)).sentiment.polarity reviews['sentiment_score'] = reviews['review_body'].apply(get_sentiment) # Check results print(reviews[['review_body', 'sentiment_score']].head()) # Average score print(f"Average sentiment score: {reviews['sentiment_score'].mean():.4f}")

5. Sentiment Analysis and Visualization by Category

Let’s find out which product category has the most customer complaints.

ā“ Problem 3: Find the Worst Categories

Q. Based on calculated sentiment scores, aggregate average sentiment score and negative review ratio (score < 0) by category and visualize.

View Solution

import matplotlib.pyplot as plt # Aggregation cat_summary = reviews.groupby('relabeled_category').agg({ 'sentiment_score': 'mean', 'review_id': 'count' }).reset_index() # Calculate negative review ratio neg_reviews = reviews[reviews['sentiment_score'] < 0].groupby('relabeled_category').size() total_reviews = reviews.groupby('relabeled_category').size() cat_summary['neg_ratio'] = (neg_reviews / total_reviews * 100).fillna(0).values # Visualization (Scatter Plot) plt.figure(figsize=(10, 6)) plt.scatter(cat_summary['neg_ratio'], cat_summary['sentiment_score'], s=cat_summary['review_id']/10, alpha=0.5) for i, txt in enumerate(cat_summary['relabeled_category']): plt.annotate(txt, (cat_summary['neg_ratio'][i], cat_summary['sentiment_score'][i])) plt.xlabel('Negative Review Ratio (%)') plt.ylabel('Average Sentiment Score') plt.title('Sentiment Analysis by Category') plt.axvline(x=30, color='r', linestyle='--') # Danger if over 30% plt.show()

šŸ’” Summary

  • Keyword Matching: Fast and free but less accurate. Can’t distinguish ā€œNot badā€.
  • Rule-based NLP (TextBlob): Considers grammar for slightly better accuracy.
  • LLM (Next Chapter): Highest level of performance that even understands ā€œsarcasmā€.

In the next chapter, we’ll use Gemini to understand and classify text like a real human.

Last updated on

šŸ¤–AI ėŖØģ˜ė©“ģ ‘ģ‹¤ģ „ģ²˜ėŸ¼ ģ—°ģŠµķ•˜źø°