Ranking and Movement Analysis with Window Functions
Learning Objectives
After completing this recipe, you will be able to:
- Understand the concept of Window Functions and how they differ from GROUP BY
- Create rankings with ROW_NUMBER, RANK, DENSE_RANK
- Reference previous/next rows with LAG/LEAD for change analysis
- Calculate cumulative totals with SUM OVER
- Trend analysis with Moving Average
2.1 BigQuery Execution Environment Setup
Set up the environment for executing BigQuery queries.
from google.cloud import bigquery
import pandas as pd
# Authentication setup (when using service account key)
# import os
# os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "path/to/your-key.json"
# Create BigQuery client
client = bigquery.Client(project='your-project-id')
# Query execution function
def run_query(query):
return client.query(query).to_dataframe()1. What are Window Functions?
Theory
Window Functions perform calculations on a set of rows but don’t reduce rows like GROUP BY. For each row, they return calculation results based on a related set of rows (window).
GROUP BY vs Window Functions
| Characteristic | GROUP BY | Window Functions |
|---|---|---|
| Result row count | Reduced to group count | Original row count maintained |
| Individual row info | Lost | Preserved |
| Use cases | Category totals | Ranking, cumulative sum, moving average |
Basic Syntax
function_name() OVER (
[PARTITION BY group_column] -- Specify grouping unit (optional)
[ORDER BY sort_column] -- Sort criteria (required for some functions)
[ROWS/RANGE frame_specification] -- Window range (optional)
)PARTITION BY: Grouping only within window function, row count maintainedGROUP BY: Grouping for entire query, row count reduced
2. Ranking Functions: ROW_NUMBER, RANK, DENSE_RANK
Theory
| Function | Description | Tie Handling |
|---|---|---|
ROW_NUMBER() | Unique sequence number | Different numbers even for ties |
RANK() | Rank (skips for ties) | 1, 2, 2, 4, 5 |
DENSE_RANK() | Rank (consecutive for ties) | 1, 2, 2, 3, 4 |
Comparison Example
| Revenue | ROW_NUMBER | RANK | DENSE_RANK |
|---|---|---|---|
| 100 | 1 | 1 | 1 |
| 90 | 2 | 2 | 2 |
| 90 | 3 | 2 | 2 |
| 80 | 4 | 4 | 3 |
| 70 | 5 | 5 | 4 |
Practice: Top Products by Revenue per Category
WITH product_revenue AS (
SELECT
p.product_id,
p.name,
p.category,
p.brand,
SUM(oi.sale_price) AS total_revenue
FROM `your-project-id.thelook_ecommerce.order_items` oi
JOIN `your-project-id.thelook_ecommerce.products` p ON oi.product_id = p.product_id
GROUP BY p.product_id, p.name, p.category, p.brand
)
SELECT
category,
name,
brand,
total_revenue,
ROW_NUMBER() OVER (PARTITION BY category ORDER BY total_revenue DESC) AS rank_in_category
FROM product_revenue
QUALIFY rank_in_category <= 3
ORDER BY category, rank_in_categoryExecution Result:
| category | name | brand | total_revenue | rank_in_category |
|---|---|---|---|---|
| Accessories | Ray-Ban Sunglasses | Ray-Ban | 2,520.00 | 1 |
| Accessories | Persol Sunglasses | Persol | 2,506.00 | 2 |
| Accessories | Tom Ford Sunglasses | Tom Ford | 2,371.60 | 3 |
| Active | Jordan Short | Jordan | 6,321.00 | 1 |
| … | … | … | … | … |
BigQuery’s QUALIFY is used to filter window function results. Use it instead of WHERE or HAVING for clean filtering without subqueries.
3. LAG/LEAD - Reference Previous/Next Rows
Theory
| Function | Description | Use |
|---|---|---|
LAG(col, n) | Value from n rows before | Month-over-month, day-over-day |
LEAD(col, n) | Value from n rows after | Days until next purchase |
LAG(column, offset, default) OVER (PARTITION BY ... ORDER BY ...)offset: How many rows before/after (default: 1)default: Value to return instead of NULL
Practice: Monthly Revenue Change Analysis
WITH monthly_revenue AS (
SELECT
DATE_TRUNC(CAST(o.created_at AS DATE), MONTH) AS month,
SUM(oi.sale_price) AS revenue
FROM `your-project-id.thelook_ecommerce.orders` o
JOIN `your-project-id.thelook_ecommerce.order_items` oi ON o.order_id = oi.order_id
GROUP BY month
)
SELECT
month,
revenue,
LAG(revenue, 1) OVER (ORDER BY month) AS prev_month_revenue,
revenue - LAG(revenue, 1) OVER (ORDER BY month) AS revenue_change,
ROUND(
(revenue - LAG(revenue, 1) OVER (ORDER BY month)) /
LAG(revenue, 1) OVER (ORDER BY month) * 100,
2
) AS growth_rate_pct
FROM monthly_revenue
WHERE month >= '2024-01-01'
ORDER BY monthExecution Result:
| month | revenue | prev_month_revenue | revenue_change | growth_rate_pct |
|---|---|---|---|---|
| 2024-01-01 | 156,234.50 | 142,567.25 | 13,667.25 | 9.58 |
| 2024-02-01 | 148,923.75 | 156,234.50 | -7,310.75 | -4.68 |
| 2024-03-01 | 172,456.30 | 148,923.75 | 23,532.55 | 15.80 |
| 2024-04-01 | 165,890.20 | 172,456.30 | -6,566.10 | -3.81 |
4. Cumulative Sum and Moving Average
Theory
You can adjust the calculation range by specifying the window frame.
SUM(col) OVER (
ORDER BY date
ROWS BETWEEN start AND end
)| Frame | Description |
|---|---|
UNBOUNDED PRECEDING | From the first row |
CURRENT ROW | Current row |
n PRECEDING | n rows before |
n FOLLOWING | n rows after |
UNBOUNDED FOLLOWING | To the last row |
Practice: Cumulative Revenue and 7-Day Moving Average
WITH daily_revenue AS (
SELECT
CAST(o.created_at AS DATE) AS date,
SUM(oi.sale_price) AS revenue
FROM `your-project-id.thelook_ecommerce.orders` o
JOIN `your-project-id.thelook_ecommerce.order_items` oi ON o.order_id = oi.order_id
GROUP BY date
)
SELECT
date,
revenue,
-- Cumulative sum
SUM(revenue) OVER (ORDER BY date ROWS UNBOUNDED PRECEDING) AS cumulative_revenue,
-- 7-day moving average
ROUND(
AVG(revenue) OVER (
ORDER BY date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
),
2
) AS moving_avg_7d
FROM daily_revenue
WHERE date >= '2024-10-01'
ORDER BY dateExecution Result:
| date | revenue | cumulative_revenue | moving_avg_7d |
|---|---|---|---|
| 2024-10-01 | 5,234.50 | 2,345,678.25 | 4,892.50 |
| 2024-10-02 | 4,891.30 | 2,350,569.55 | 4,956.33 |
| 2024-10-03 | 5,123.75 | 2,355,693.30 | 5,012.75 |
| 2024-10-04 | 4,980.20 | 2,360,673.50 | 5,057.44 |
| 2024-10-05 | 5,345.60 | 2,366,019.10 | 5,115.07 |
- 7-day moving average: Smooth daily fluctuations to identify weekly trends
- 30-day moving average: Identify seasonal patterns
- Stocks/Finance: Golden cross (short > long), death cross signals
5. Using PARTITION BY
Practice: Purchase Order and Cumulative Purchase Amount by Customer
SELECT
o.user_id,
o.order_id,
o.created_at,
oi.sale_price,
-- Purchase order by customer
ROW_NUMBER() OVER (PARTITION BY o.user_id ORDER BY o.created_at) AS purchase_order,
-- Cumulative purchase amount by customer
SUM(oi.sale_price) OVER (
PARTITION BY o.user_id
ORDER BY o.created_at
ROWS UNBOUNDED PRECEDING
) AS cumulative_purchase,
-- Days since last purchase
DATE_DIFF(
CAST(o.created_at AS DATE),
CAST(LAG(o.created_at) OVER (PARTITION BY o.user_id ORDER BY o.created_at) AS DATE),
DAY
) AS days_since_last_purchase
FROM `your-project-id.thelook_ecommerce.orders` o
JOIN `your-project-id.thelook_ecommerce.order_items` oi ON o.order_id = oi.order_id
WHERE o.user_id IN (1, 2, 3) -- Sample customers
ORDER BY o.user_id, o.created_atExecution Result:
| user_id | order_id | created_at | sale_price | purchase_order | cumulative_purchase | days_since_last_purchase |
|---|---|---|---|---|---|---|
| 1 | 1001 | 2023-01-15 | 125.50 | 1 | 125.50 | NULL |
| 1 | 1245 | 2023-03-22 | 89.99 | 2 | 215.49 | 66 |
| 2 | 2031 | 2023-02-10 | 234.00 | 1 | 234.00 | NULL |
| 2 | 2156 | 2023-05-18 | 156.75 | 2 | 390.75 | 97 |
| 3 | 3089 | 2023-01-28 | 78.50 | 1 | 78.50 | NULL |
Quiz
Problem 1: Brand Revenue Ranking by Category
Calculate brand revenue ranking within each category and query only the top 3 brands.
View Answer
WITH category_brand_revenue AS (
SELECT
p.category,
p.brand,
SUM(oi.sale_price) AS total_revenue
FROM `your-project-id.thelook_ecommerce.order_items` oi
JOIN `your-project-id.thelook_ecommerce.products` p ON oi.product_id = p.product_id
GROUP BY p.category, p.brand
)
SELECT
category,
brand,
total_revenue,
RANK() OVER (PARTITION BY category ORDER BY total_revenue DESC) AS brand_rank
FROM category_brand_revenue
QUALIFY brand_rank <= 3
ORDER BY category, brand_rankExplanation:
- Create groups by category with
PARTITION BY category RANK()gives the same rank for ties (handles brands with same revenue)- Filter with
QUALIFYwithout subquery
Problem 2: Customer Repurchase Interval Analysis
Calculate each customer’s average repurchase interval (days). (Only customers with 2+ purchases)
View Answer
WITH purchase_gaps AS (
SELECT
user_id,
created_at,
LAG(created_at) OVER (PARTITION BY user_id ORDER BY created_at) AS prev_purchase,
DATE_DIFF(
CAST(created_at AS DATE),
CAST(LAG(created_at) OVER (PARTITION BY user_id ORDER BY created_at) AS DATE),
DAY
) AS days_gap
FROM `your-project-id.thelook_ecommerce.orders`
)
SELECT
user_id,
COUNT(*) + 1 AS total_purchases,
ROUND(AVG(days_gap), 1) AS avg_repurchase_days
FROM purchase_gaps
WHERE days_gap IS NOT NULL
GROUP BY user_id
HAVING COUNT(*) >= 1 -- 2+ purchases (at least 1 gap)
ORDER BY avg_repurchase_days
LIMIT 20Explanation:
- Reference previous purchase date with
LAG - Calculate interval with
DATE_DIFF - First purchase has NULL
days_gap, so excluded - Identify loyal customers with short average repurchase intervals
Summary
| Function | Use | Example |
|---|---|---|
ROW_NUMBER() | Unique sequence | Most recent 1 order per customer |
RANK() | Rank (skip ties) | Revenue ranking |
DENSE_RANK() | Rank (consecutive ties) | Tier classification |
LAG() | Previous row value | Month-over-month change |
LEAD() | Next row value | Days until next purchase |
SUM() OVER | Cumulative/moving sum | Cumulative revenue, moving average |
AVG() OVER | Moving average | 7-day/30-day average |
Next Steps
You’ve mastered window functions! Next, learn complex query structuring and advanced analysis techniques in CTE and Cohort Analysis.