Skip to Content
Getting StartedUnderstanding Data Structure

Understanding Data Structure

📦 Download Dataset (BigQuery Setup Guide)

Understand the structure and relationships of the datasets used in this Cookbook.

Dataset Overview

This Cookbook is based on data from a fictional e-commerce company. It extends Google BigQuery’s thelook_ecommerce public dataset with CS tickets, marketing campaigns, external market data, and more.

Data Model (ERD)

┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │ src_users │ │ src_orders │ │ src_order_items │ ├─────────────────┤ ├─────────────────┤ ├─────────────────┤ │ user_id (PK) │──────▶│ order_id (PK) │◀──────│ order_id (FK) │ │ first_name │ │ user_id (FK) │ │ product_id (FK) │ │ last_name │ │ created_at │ │ sale_price │ │ email │ │ status │ │ returned_at │ │ gender │ │ num_of_item │ │ shipped_at │ │ age │ └─────────────────┘ │ delivered_at │ │ country │ └────────┬────────┘ │ state │ │ │ created_at │ │ └─────────────────┘ ▼ ┌─────────────────┐ ┌─────────────────┐ │ src_products │ │ src_events │ ├─────────────────┤ ├─────────────────┤ │ product_id (PK) │ │ id (PK) │ │ category │ │ user_id (FK) │ │ brand │ │ session_id │ │ department │ │ sequence_number │ │ name │ │ created_at │ │ retail_price │ │ ip_address │ │ cost │ └─────────────────┘ └─────────────────┘

Detailed Table Descriptions

1. Core Tables (E-commerce)

src_users - Customer Information

ColumnTypeDescription
user_idINT64Customer unique identifier (PK)
first_nameSTRINGFirst name
last_nameSTRINGLast name
emailSTRINGEmail address
genderSTRINGGender (M/F)
ageINT64Age
countrySTRINGCountry
stateSTRINGState/Region
created_atTIMESTAMPRegistration date
ℹ️
Data Scale

Contains approximately 100,000 customer records.

src_orders - Order Information

ColumnTypeDescription
order_idINT64Order unique identifier (PK)
user_idINT64Customer ID (FK)
created_atTIMESTAMPOrder date/time
statusSTRINGOrder status (Shipped, Complete, Cancelled, etc.)
num_of_itemINT64Number of items in order

Order Status Values:

  • Shipped: In transit
  • Complete: Delivered
  • Cancelled: Cancelled
  • Returned: Returned
  • Processing: Processing

src_order_items - Order Item Details

ColumnTypeDescription
order_idINT64Order ID (FK)
product_idINT64Product ID (FK)
sale_priceFLOAT64Sale price
returned_atTIMESTAMPReturn date/time (nullable)
shipped_atTIMESTAMPShipping start date/time
delivered_atTIMESTAMPDelivery completion date/time
💡
Composite Key

The combination of order_id and product_id is the composite primary key for this table.

src_products - Product Information

ColumnTypeDescription
product_idINT64Product unique identifier (PK)
categorySTRINGCategory
brandSTRINGBrand
departmentSTRINGDepartment (Men/Women)
nameSTRINGProduct name
retail_priceFLOAT64Retail price
costFLOAT64Cost
skuSTRINGSKU code
distribution_center_idINT64Distribution center ID

Main Categories:

  • Accessories, Active, Blazers & Jackets, Clothing Sets
  • Dresses, Fashion Hoodies & Sweatshirts, Jeans
  • Leggings, Maternity, Outerwear & Coats, Pants
  • Pants & Capris, Plus, Shorts, Skirts, Sleep & Lounge
  • Socks, Socks & Hosiery, Suits, Suits & Sport Coats
  • Sweaters, Swim, Tops & Tees, Underwear

src_events - Web Event Logs

ColumnTypeDescription
idINT64Event ID (PK)
user_idINT64Customer ID (FK, nullable)
session_idSTRINGSession ID
sequence_numberINT64Event sequence within session
created_atTIMESTAMPEvent timestamp
ip_addressSTRINGIP address

2. Augmented Data (for Session Analysis)

events_augmented - Session-level Aggregation

ColumnTypeDescription
session_idSTRINGSession ID (PK)
user_idINT64Customer ID
session_start_atTIMESTAMPSession start time
session_dateDATESession date
events_in_sessionINT64Number of events in session
channel_keySTRINGAcquisition channel
device_keySTRINGDevice type
landing_pageSTRINGLanding page URL
pageviews_estINT64Estimated pageviews
add_to_cart_estINT64Estimated add-to-cart count

Channel Types:

  • organic: Organic search
  • paid_search: Paid search ads
  • paid_social: Paid social ads
  • display: Display ads
  • email: Email marketing
  • referral: Referral/affiliate
  • direct: Direct visit

3. Customer Service Data

cs_tickets_dummy - CS Tickets

ColumnTypeDescription
ticket_idSTRINGTicket ID (PK)
user_idINT64Customer ID (FK)
opened_atTIMESTAMPTicket creation time
first_response_atTIMESTAMPFirst response time
resolved_atTIMESTAMPResolution time
statusSTRINGStatus (open, pending, solved, escalated)
prioritySTRINGPriority (low, normal, high, urgent)
channelSTRINGContact channel (email, chat, phone)
issue_typeSTRINGIssue type
agent_idSTRINGAssigned agent ID
country_nameSTRINGCustomer country
categorySTRINGRelated product category
satisfaction_scoreINT64Satisfaction score (1-5)
commentSTRINGCustomer comment

Issue Types:

  • size: Size inquiry
  • quality: Quality issue
  • shipping: Shipping related
  • payment: Payment issue
  • refund: Refund request

survey_cs_dummy - Customer Satisfaction Survey

ColumnTypeDescription
survey_idSTRINGSurvey ID (PK)
user_idINT64Customer ID
sent_atTIMESTAMPSurvey sent time
completed_atTIMESTAMPSurvey completion time
nps_scoreINT64NPS score (0-10)
csat_scoreINT64CSAT score (1-5)
free_textSTRINGFree-form response
related_ticket_idSTRINGRelated ticket ID

4. Marketing Data

mkt_campaigns_dummy - Campaign Metadata

ColumnTypeDescription
campaign_idSTRINGCampaign ID (PK)
campaign_nameSTRINGCampaign name
channel_keySTRINGChannel
start_dateDATEStart date
end_dateDATEEnd date
target_categorySTRINGTarget category
target_countrySTRINGTarget country
objectiveSTRINGObjective (awareness, traffic, conversion, etc.)
kpiSTRINGKPI (CTR, CVR, ROAS, LTV)
budgetFLOAT64Budget

mkt_channel_spend_dummy - Channel Ad Spend

ColumnTypeDescription
dateDATEDate
channel_keySTRINGChannel
campaign_idSTRINGCampaign ID
impressionsINT64Impressions
clicksINT64Clicks
spendFLOAT64Spend amount
landing_pageSTRINGLanding page
country_nameSTRINGCountry
device_keySTRINGDevice

5. External Data

ColumnTypeDescription
weekDATEWeek start date
keywordSTRINGKeyword
region_codeSTRINGRegion code
scoreINT64Trend score (0-100)

ext_competitive_signal - Competitive Signals

ColumnTypeDescription
periodDATEPeriod (month)
topicSTRINGTopic
region_codeSTRINGRegion code
signalFLOAT64Signal strength
sourceSTRINGSource (patent, news, social)

Using Data Relationships

Basic Join Patterns

-- Join orders and customer information SELECT o.order_id, o.created_at, u.country, u.age FROM src_orders o JOIN src_users u ON o.user_id = u.user_id -- Join order details and product information SELECT oi.order_id, oi.sale_price, p.category, p.brand FROM src_order_items oi JOIN src_products p ON oi.product_id = p.product_id -- Full join (orders → products → customers) SELECT o.order_id, o.created_at, u.country, p.category, oi.sale_price FROM src_orders o JOIN src_users u ON o.user_id = u.user_id JOIN src_order_items oi ON o.order_id = oi.order_id JOIN src_products p ON oi.product_id = p.product_id

Pandas Equivalent Code

# Load data orders = pd.read_csv('src_orders.csv') users = pd.read_csv('src_users.csv') order_items = pd.read_csv('src_order_items.csv') products = pd.read_csv('src_products.csv') # Join orders and customer information orders_with_users = orders.merge(users, on='user_id') # Join order details and product information items_with_products = order_items.merge(products, on='product_id') # Full join full_data = (orders .merge(users, on='user_id') .merge(order_items, on='order_id') .merge(products, on='product_id'))

Next Steps

Now that you understand the data structure, you’re ready to start the actual analysis:

Last updated on

🤖AI 모의면접실전처럼 연습하기