Skip to Content
ConceptsSQLCombining Tables with JOIN

Combining Tables with JOIN

BeginnerIntermediate

Learning Objectives

After completing this recipe, you will be able to:

  • Understand the concept of JOIN and set theory
  • Extract intersection data with INNER JOIN
  • Maintain the base table with LEFT/RIGHT JOIN
  • Create unions with FULL OUTER JOIN
  • Generate all combinations with CROSS JOIN
  • Multi-table JOIN and performance optimization

BigQuery Execution Environment Setup

from google.cloud import bigquery import pandas as pd # Authentication setup (when using service account key) # import os # os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "path/to/your-key.json" # Create BigQuery client client = bigquery.Client(project='your-project-id') # Query execution function def run_query(query): return client.query(query).to_dataframe()
ℹ️
Project ID Setup

Replace your-project-id with your GCP project ID in the code above.


1. What is JOIN?

Theory

JOIN is a SQL operation that combines two or more tables based on specific conditions. Since data in relational databases is stored across multiple tables, you need to combine tables for meaningful analysis.

Understanding with Set Theory

JOIN TypeSet OperationDescription
INNER JOINA ∩ BOnly rows that exist in both tables
LEFT JOINA + (A ∩ B)All of the left table + matching rows from the right
RIGHT JOIN(A ∩ B) + BAll of the right table + matching rows from the left
FULL OUTER JOINA ∪ BBoth tables entirely
CROSS JOINA × BAll possible combinations (Cartesian product)

Basic Syntax

SELECT columns FROM table_a [INNER | LEFT | RIGHT | FULL OUTER | CROSS] JOIN table_b ON table_a.key = table_b.key

2. INNER JOIN - Intersection

Theory

INNER JOIN returns only rows where the condition matches in both tables. It is the most commonly used JOIN type.

Table A Table B INNER JOIN Result +----+----+ +----+----+ +----+----+----+ | id | a | | id | b | | id | a | b | +----+----+ +----+----+ +----+----+----+ | 1 | X | | 1 | P | → | 1 | X | P | | 2 | Y | | 3 | Q | | 3 | Z | Q | | 3 | Z | | 4 | R | +----+----+----+ +----+----+ +----+----+

Practice: Combining Orders and Order Details

SELECT o.order_id, o.user_id, o.status, o.created_at, oi.product_id, oi.sale_price FROM `your-project-id.thelook_ecommerce.orders` o INNER JOIN `your-project-id.thelook_ecommerce.order_items` oi ON o.order_id = oi.order_id WHERE o.status = 'Complete' LIMIT 10

Execution Result:

order_iduser_idstatuscreated_atproduct_idsale_price
154321Complete2024-01-15 10:30:001234559.99
154321Complete2024-01-15 10:30:006789029.99
212345Complete2024-01-15 11:45:001111189.99
💡
Table Alias

Assigning an alias to a table like orders o makes the query more concise. This is essential when JOINing multiple tables.


3. LEFT JOIN - Left-Based

Theory

LEFT JOIN keeps all rows from the left table and combines matching rows from the right table. If there’s no match, it fills with NULL.

Table A Table B LEFT JOIN Result +----+----+ +----+----+ +----+----+------+ | id | a | | id | b | | id | a | b | +----+----+ +----+----+ +----+----+------+ | 1 | X | | 1 | P | → | 1 | X | P | | 2 | Y | | 3 | Q | | 2 | Y | NULL | | 3 | Z | | 4 | R | | 3 | Z | Q | +----+----+ +----+----+ +----+----+------+

Practice: All Customers and Order History

SELECT u.id AS user_id, u.first_name, u.last_name, u.email, COUNT(o.order_id) AS order_count, COALESCE(SUM(oi.sale_price), 0) AS total_spent FROM `your-project-id.thelook_ecommerce.users` u LEFT JOIN `your-project-id.thelook_ecommerce.orders` o ON u.id = o.user_id LEFT JOIN `your-project-id.thelook_ecommerce.order_items` oi ON o.order_id = oi.order_id GROUP BY u.id, u.first_name, u.last_name, u.email ORDER BY total_spent DESC LIMIT 15

Execution Result:

user_idfirst_namelast_nameemailorder_counttotal_spent
59071JohnSmithjohn.smith@example.com81633.89
84005JaneDoejane.doe@example.com41500.73
90572BobJohnsonbob.j@example.com61477.39
12345AliceBrownalice.b@example.com00
ℹ️
LEFT JOIN Use Cases
  • Finding customers who haven’t ordered (WHERE o.order_id IS NULL)
  • Finding products without reviews
  • Finding orders with failed payments

Practice: Finding Customers Who Haven’t Ordered

SELECT u.id AS user_id, u.first_name, u.last_name, u.email, u.created_at AS signup_date FROM `your-project-id.thelook_ecommerce.users` u LEFT JOIN `your-project-id.thelook_ecommerce.orders` o ON u.id = o.user_id WHERE o.order_id IS NULL ORDER BY u.created_at DESC LIMIT 10

Execution Result:

user_idfirst_namelast_nameemailsignup_date
99876MikeWilsonmike.w@example.com2024-11-15
99543SarahLeesarah.l@example.com2024-11-14

4. RIGHT JOIN and FULL OUTER JOIN

RIGHT JOIN

RIGHT JOIN is the opposite of LEFT JOIN. It keeps all rows from the right table.

-- RIGHT JOIN (LEFT JOIN is more commonly used in practice) SELECT o.order_id, o.status, u.first_name, u.last_name FROM `your-project-id.thelook_ecommerce.users` u RIGHT JOIN `your-project-id.thelook_ecommerce.orders` o ON u.id = o.user_id LIMIT 10
💡
Practical Tip

In most cases, instead of RIGHT JOIN, swap the table order and use LEFT JOIN. It’s more readable.

FULL OUTER JOIN

FULL OUTER JOIN includes all rows from both tables.

-- All users and all orders (regardless of matching) SELECT u.id AS user_id, u.first_name, o.order_id, o.status FROM `your-project-id.thelook_ecommerce.users` u FULL OUTER JOIN `your-project-id.thelook_ecommerce.orders` o ON u.id = o.user_id WHERE u.id IS NULL OR o.order_id IS NULL -- Only unmatched ones LIMIT 10

5. CROSS JOIN - Cartesian Product

Theory

CROSS JOIN generates all possible combinations of two tables. Since it combines without conditions, the resulting row count = rows in A × rows in B.

Practice: Generate All Product-Promotion Combinations

-- All combinations of promotions and categories WITH promotions AS ( SELECT promotion_name FROM UNNEST(['Black Friday', 'Summer Sale', 'New Year']) AS promotion_name ), categories AS ( SELECT DISTINCT category FROM `your-project-id.thelook_ecommerce.products` LIMIT 5 ) SELECT p.promotion_name, c.category, CONCAT(p.promotion_name, ' - ', c.category) AS campaign_name FROM promotions p CROSS JOIN categories c ORDER BY p.promotion_name, c.category

Execution Result:

promotion_namecategorycampaign_name
Black FridayAccessoriesBlack Friday - Accessories
Black FridayActiveBlack Friday - Active
Black FridayJeansBlack Friday - Jeans
Summer SaleAccessoriesSummer Sale - Accessories
⚠️
Caution

CROSS JOIN results can grow rapidly. 10,000 rows × 10,000 rows = 100 million rows! Use it carefully.


6. Multi-Table JOIN

Practice: Order Detail Analysis (4 Tables)

SELECT o.order_id, o.created_at AS order_date, o.status, u.first_name || ' ' || u.last_name AS customer_name, u.country, p.name AS product_name, p.category, p.brand, oi.sale_price, p.retail_price, ROUND(oi.sale_price - p.cost, 2) AS profit FROM `your-project-id.thelook_ecommerce.orders` o INNER JOIN `your-project-id.thelook_ecommerce.users` u ON o.user_id = u.id INNER JOIN `your-project-id.thelook_ecommerce.order_items` oi ON o.order_id = oi.order_id INNER JOIN `your-project-id.thelook_ecommerce.products` p ON oi.product_id = p.id WHERE o.created_at >= '2024-01-01' ORDER BY o.created_at DESC LIMIT 15

Execution Result:

order_idorder_datestatuscustomer_namecountryproduct_namecategorybrandsale_priceprofit
1250002024-11-15CompleteJohn SmithUnited StatesClassic JeansJeansLevi’s89.9935.50
1249992024-11-15ShippedJane DoeChinaCotton T-ShirtTops & TeesH&M24.9912.30

7. JOIN Performance Optimization

Core Principles

  1. Use indexes on JOIN keys: BigQuery auto-optimizes, but leverage partitioning/clustering keys
  2. Put smaller tables on the right: BigQuery broadcasts the right table
  3. SELECT only necessary columns: Avoid SELECT *
  4. Filter early: Reduce data with WHERE before JOIN

Good vs Bad Examples

-- ❌ Bad example: Filtering after JOIN SELECT * FROM orders o JOIN order_items oi ON o.order_id = oi.order_id WHERE o.created_at >= '2024-01-01' -- ✅ Good example: Filter first with subquery SELECT o.order_id, o.status, oi.sale_price FROM ( SELECT order_id, status FROM orders WHERE created_at >= '2024-01-01' ) o JOIN order_items oi ON o.order_id = oi.order_id

Check Execution Plan with EXPLAIN

-- Add EXPLAIN before the query EXPLAIN SELECT o.order_id, COUNT(oi.id) FROM `your-project-id.thelook_ecommerce.orders` o JOIN `your-project-id.thelook_ecommerce.order_items` oi ON o.order_id = oi.order_id GROUP BY o.order_id

Quiz

Problem 1: Top 3 Brands by Revenue per Category

Query the top 3 brands by revenue in each category.

Hint: Use ROW_NUMBER() after JOINing products and order_items

View Answer

WITH brand_revenue AS ( SELECT p.category, p.brand, SUM(oi.sale_price) AS total_revenue, COUNT(*) AS items_sold FROM `your-project-id.thelook_ecommerce.order_items` oi INNER JOIN `your-project-id.thelook_ecommerce.products` p ON oi.product_id = p.id GROUP BY p.category, p.brand ), ranked AS ( SELECT category, brand, total_revenue, items_sold, ROW_NUMBER() OVER (PARTITION BY category ORDER BY total_revenue DESC) AS rank FROM brand_revenue ) SELECT category, brand, total_revenue, items_sold, rank FROM ranked WHERE rank <= 3 ORDER BY category, rank

Execution Result:

categorybrandtotal_revenueitems_soldrank
AccessoriesRay-Ban125,340.501,5231
AccessoriesOakley98,230.751,2452
AccessoriesPersol87,650.009873
JeansLevi’s234,560.252,8901

Problem 2: Repeat Customer Analysis

Query the first and last order information for customers who have ordered 2 or more times.

Hint: Self Join - JOIN the orders table with itself

View Answer

WITH customer_orders AS ( SELECT user_id, MIN(created_at) AS first_order_date, MAX(created_at) AS last_order_date, COUNT(DISTINCT order_id) AS total_orders FROM `your-project-id.thelook_ecommerce.orders` WHERE status = 'Complete' GROUP BY user_id HAVING COUNT(DISTINCT order_id) >= 2 ) SELECT co.user_id, u.first_name, u.last_name, co.first_order_date, co.last_order_date, co.total_orders, DATE_DIFF(CAST(co.last_order_date AS DATE), CAST(co.first_order_date AS DATE), DAY) AS days_as_customer FROM customer_orders co INNER JOIN `your-project-id.thelook_ecommerce.users` u ON co.user_id = u.id ORDER BY co.total_orders DESC, days_as_customer DESC LIMIT 20

Execution Result:

user_idfirst_namelast_namefirst_order_datelast_order_datetotal_ordersdays_as_customer
4523MichaelChen2023-01-022024-11-1512682
7891EmilyWang2023-02-152024-10-289621

Problem 3: Finding Products Never Sold

Find products that have never been sold.

Hint: LEFT JOIN + IS NULL

View Answer

SELECT p.id AS product_id, p.name, p.category, p.brand, p.retail_price, p.created_at AS product_created_at FROM `your-project-id.thelook_ecommerce.products` p LEFT JOIN `your-project-id.thelook_ecommerce.order_items` oi ON p.id = oi.product_id WHERE oi.id IS NULL ORDER BY p.retail_price DESC LIMIT 20

Execution Result:

product_idnamecategorybrandretail_priceproduct_created_at
28456Premium Jacket XOuterwear & CoatsCanada Goose899.992024-10-01
29012Designer Dress YDressesGucci750.002024-09-15

Business Insights:

  • If there are many expensive products: Review pricing strategy
  • If they’re new products: Strengthen marketing
  • Identify candidates for inventory clearance

Summary

JOIN TypeWhen to UseResult
INNER JOINOnly need data in bothIntersection
LEFT JOINAll left + matching rightLeft-based
RIGHT JOINAll right + matching leftRight-based
FULL OUTER JOINInclude both sidesUnion
CROSS JOINGenerate all combinationsCartesian product

JOIN Selection Guide

What data do you need? ├─ Must exist in both → INNER JOIN ├─ All of one side, attach other if exists → LEFT/RIGHT JOIN ├─ Need all of both sides → FULL OUTER JOIN └─ Need all combinations → CROSS JOIN

Next Steps

You’ve mastered JOIN! Next, learn about GROUP BY and aggregate functions in Aggregation and Grouping.

Last updated on

🤖AI 모의면접실전처럼 연습하기