Skip to Content
ConceptsSQLWindow Functions

Ranking and Movement Analysis with Window Functions

IntermediateAdvanced

Learning Objectives

After completing this recipe, you will be able to:

  • Understand the concept of Window Functions and how they differ from GROUP BY
  • Create rankings with ROW_NUMBER, RANK, DENSE_RANK
  • Reference previous/next rows with LAG/LEAD for change analysis
  • Calculate cumulative totals with SUM OVER
  • Trend analysis with Moving Average

2.1 BigQuery Execution Environment Setup

Set up the environment for executing BigQuery queries.

from google.cloud import bigquery import pandas as pd # Authentication setup (when using service account key) # import os # os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "path/to/your-key.json" # Create BigQuery client client = bigquery.Client(project='your-project-id') # Query execution function def run_query(query): return client.query(query).to_dataframe()

1. What are Window Functions?

Theory

Window Functions perform calculations on a set of rows but don’t reduce rows like GROUP BY. For each row, they return calculation results based on a related set of rows (window).

GROUP BY vs Window Functions

CharacteristicGROUP BYWindow Functions
Result row countReduced to group countOriginal row count maintained
Individual row infoLostPreserved
Use casesCategory totalsRanking, cumulative sum, moving average

Basic Syntax

function_name() OVER ( [PARTITION BY group_column] -- Specify grouping unit (optional) [ORDER BY sort_column] -- Sort criteria (required for some functions) [ROWS/RANGE frame_specification] -- Window range (optional) )
ℹ️
PARTITION BY vs GROUP BY
  • PARTITION BY: Grouping only within window function, row count maintained
  • GROUP BY: Grouping for entire query, row count reduced

2. Ranking Functions: ROW_NUMBER, RANK, DENSE_RANK

Theory

FunctionDescriptionTie Handling
ROW_NUMBER()Unique sequence numberDifferent numbers even for ties
RANK()Rank (skips for ties)1, 2, 2, 4, 5
DENSE_RANK()Rank (consecutive for ties)1, 2, 2, 3, 4

Comparison Example

RevenueROW_NUMBERRANKDENSE_RANK
100111
90222
90322
80443
70554

Practice: Top Products by Revenue per Category

WITH product_revenue AS ( SELECT p.product_id, p.name, p.category, p.brand, SUM(oi.sale_price) AS total_revenue FROM `your-project-id.thelook_ecommerce.order_items` oi JOIN `your-project-id.thelook_ecommerce.products` p ON oi.product_id = p.product_id GROUP BY p.product_id, p.name, p.category, p.brand ) SELECT category, name, brand, total_revenue, ROW_NUMBER() OVER (PARTITION BY category ORDER BY total_revenue DESC) AS rank_in_category FROM product_revenue QUALIFY rank_in_category <= 3 ORDER BY category, rank_in_category

Execution Result:

categorynamebrandtotal_revenuerank_in_category
AccessoriesRay-Ban SunglassesRay-Ban2,520.001
AccessoriesPersol SunglassesPersol2,506.002
AccessoriesTom Ford SunglassesTom Ford2,371.603
ActiveJordan ShortJordan6,321.001
💡
QUALIFY Clause

BigQuery’s QUALIFY is used to filter window function results. Use it instead of WHERE or HAVING for clean filtering without subqueries.


3. LAG/LEAD - Reference Previous/Next Rows

Theory

FunctionDescriptionUse
LAG(col, n)Value from n rows beforeMonth-over-month, day-over-day
LEAD(col, n)Value from n rows afterDays until next purchase
LAG(column, offset, default) OVER (PARTITION BY ... ORDER BY ...)
  • offset: How many rows before/after (default: 1)
  • default: Value to return instead of NULL

Practice: Monthly Revenue Change Analysis

WITH monthly_revenue AS ( SELECT DATE_TRUNC(CAST(o.created_at AS DATE), MONTH) AS month, SUM(oi.sale_price) AS revenue FROM `your-project-id.thelook_ecommerce.orders` o JOIN `your-project-id.thelook_ecommerce.order_items` oi ON o.order_id = oi.order_id GROUP BY month ) SELECT month, revenue, LAG(revenue, 1) OVER (ORDER BY month) AS prev_month_revenue, revenue - LAG(revenue, 1) OVER (ORDER BY month) AS revenue_change, ROUND( (revenue - LAG(revenue, 1) OVER (ORDER BY month)) / LAG(revenue, 1) OVER (ORDER BY month) * 100, 2 ) AS growth_rate_pct FROM monthly_revenue WHERE month >= '2024-01-01' ORDER BY month

Execution Result:

monthrevenueprev_month_revenuerevenue_changegrowth_rate_pct
2024-01-01156,234.50142,567.2513,667.259.58
2024-02-01148,923.75156,234.50-7,310.75-4.68
2024-03-01172,456.30148,923.7523,532.5515.80
2024-04-01165,890.20172,456.30-6,566.10-3.81

4. Cumulative Sum and Moving Average

Theory

You can adjust the calculation range by specifying the window frame.

SUM(col) OVER ( ORDER BY date ROWS BETWEEN start AND end )
FrameDescription
UNBOUNDED PRECEDINGFrom the first row
CURRENT ROWCurrent row
n PRECEDINGn rows before
n FOLLOWINGn rows after
UNBOUNDED FOLLOWINGTo the last row

Practice: Cumulative Revenue and 7-Day Moving Average

WITH daily_revenue AS ( SELECT CAST(o.created_at AS DATE) AS date, SUM(oi.sale_price) AS revenue FROM `your-project-id.thelook_ecommerce.orders` o JOIN `your-project-id.thelook_ecommerce.order_items` oi ON o.order_id = oi.order_id GROUP BY date ) SELECT date, revenue, -- Cumulative sum SUM(revenue) OVER (ORDER BY date ROWS UNBOUNDED PRECEDING) AS cumulative_revenue, -- 7-day moving average ROUND( AVG(revenue) OVER ( ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW ), 2 ) AS moving_avg_7d FROM daily_revenue WHERE date >= '2024-10-01' ORDER BY date

Execution Result:

daterevenuecumulative_revenuemoving_avg_7d
2024-10-015,234.502,345,678.254,892.50
2024-10-024,891.302,350,569.554,956.33
2024-10-035,123.752,355,693.305,012.75
2024-10-044,980.202,360,673.505,057.44
2024-10-055,345.602,366,019.105,115.07
ℹ️
Moving Average Applications
  • 7-day moving average: Smooth daily fluctuations to identify weekly trends
  • 30-day moving average: Identify seasonal patterns
  • Stocks/Finance: Golden cross (short > long), death cross signals

5. Using PARTITION BY

Practice: Purchase Order and Cumulative Purchase Amount by Customer

SELECT o.user_id, o.order_id, o.created_at, oi.sale_price, -- Purchase order by customer ROW_NUMBER() OVER (PARTITION BY o.user_id ORDER BY o.created_at) AS purchase_order, -- Cumulative purchase amount by customer SUM(oi.sale_price) OVER ( PARTITION BY o.user_id ORDER BY o.created_at ROWS UNBOUNDED PRECEDING ) AS cumulative_purchase, -- Days since last purchase DATE_DIFF( CAST(o.created_at AS DATE), CAST(LAG(o.created_at) OVER (PARTITION BY o.user_id ORDER BY o.created_at) AS DATE), DAY ) AS days_since_last_purchase FROM `your-project-id.thelook_ecommerce.orders` o JOIN `your-project-id.thelook_ecommerce.order_items` oi ON o.order_id = oi.order_id WHERE o.user_id IN (1, 2, 3) -- Sample customers ORDER BY o.user_id, o.created_at

Execution Result:

user_idorder_idcreated_atsale_pricepurchase_ordercumulative_purchasedays_since_last_purchase
110012023-01-15125.501125.50NULL
112452023-03-2289.992215.4966
220312023-02-10234.001234.00NULL
221562023-05-18156.752390.7597
330892023-01-2878.50178.50NULL

Quiz

Problem 1: Brand Revenue Ranking by Category

Calculate brand revenue ranking within each category and query only the top 3 brands.

View Answer

WITH category_brand_revenue AS ( SELECT p.category, p.brand, SUM(oi.sale_price) AS total_revenue FROM `your-project-id.thelook_ecommerce.order_items` oi JOIN `your-project-id.thelook_ecommerce.products` p ON oi.product_id = p.product_id GROUP BY p.category, p.brand ) SELECT category, brand, total_revenue, RANK() OVER (PARTITION BY category ORDER BY total_revenue DESC) AS brand_rank FROM category_brand_revenue QUALIFY brand_rank <= 3 ORDER BY category, brand_rank

Explanation:

  • Create groups by category with PARTITION BY category
  • RANK() gives the same rank for ties (handles brands with same revenue)
  • Filter with QUALIFY without subquery

Problem 2: Customer Repurchase Interval Analysis

Calculate each customer’s average repurchase interval (days). (Only customers with 2+ purchases)

View Answer

WITH purchase_gaps AS ( SELECT user_id, created_at, LAG(created_at) OVER (PARTITION BY user_id ORDER BY created_at) AS prev_purchase, DATE_DIFF( CAST(created_at AS DATE), CAST(LAG(created_at) OVER (PARTITION BY user_id ORDER BY created_at) AS DATE), DAY ) AS days_gap FROM `your-project-id.thelook_ecommerce.orders` ) SELECT user_id, COUNT(*) + 1 AS total_purchases, ROUND(AVG(days_gap), 1) AS avg_repurchase_days FROM purchase_gaps WHERE days_gap IS NOT NULL GROUP BY user_id HAVING COUNT(*) >= 1 -- 2+ purchases (at least 1 gap) ORDER BY avg_repurchase_days LIMIT 20

Explanation:

  • Reference previous purchase date with LAG
  • Calculate interval with DATE_DIFF
  • First purchase has NULL days_gap, so excluded
  • Identify loyal customers with short average repurchase intervals

Summary

FunctionUseExample
ROW_NUMBER()Unique sequenceMost recent 1 order per customer
RANK()Rank (skip ties)Revenue ranking
DENSE_RANK()Rank (consecutive ties)Tier classification
LAG()Previous row valueMonth-over-month change
LEAD()Next row valueDays until next purchase
SUM() OVERCumulative/moving sumCumulative revenue, moving average
AVG() OVERMoving average7-day/30-day average

Next Steps

You’ve mastered window functions! Next, learn complex query structuring and advanced analysis techniques in CTE and Cohort Analysis.

Last updated on

🤖AI 모의면접실전처럼 연습하기