BigQuery 환경 설정
이 가이드에서는 Google BigQuery를 사용하여 SQL 기반 데이터 분석 환경을 설정하는 방법을 안내합니다.
1. Google Cloud 프로젝트 생성
ℹ️
무료 체험
Google Cloud는 신규 사용자에게 $300 크레딧과 90일의 무료 체험 기간을 제공합니다. BigQuery는 매월 1TB의 무료 쿼리 처리량과 10GB의 무료 스토리지를 제공합니다.
1.1 Google Cloud Console 접속
- Google Cloud Console 에 접속합니다.
- Google 계정으로 로그인합니다.
- 서비스 약관에 동의합니다.
1.2 프로젝트 생성
- 상단의 프로젝트 선택 드롭다운을 클릭합니다.
- **“새 프로젝트”**를 클릭합니다.
- 프로젝트 이름을 입력합니다 (예:
my-analytics-project) - **“만들기”**를 클릭합니다.
2. BigQuery API 활성화
- BigQuery API 페이지 로 이동합니다.
- “사용” 버튼을 클릭하여 API를 활성화합니다.
3. 서비스 계정 및 키 생성
Jupyter Notebook이나 Python 스크립트에서 BigQuery에 접근하려면 서비스 계정이 필요합니다.
3.1 서비스 계정 생성
- IAM & Admin > 서비스 계정 으로 이동합니다.
- **“서비스 계정 만들기”**를 클릭합니다.
- 다음 정보를 입력합니다:
- 서비스 계정 이름:
bigquery-access - 설명:
BigQuery 데이터 분석용 서비스 계정
- 서비스 계정 이름:
- **“만들고 계속”**을 클릭합니다.
3.2 권한 부여
다음 역할을 추가합니다:
- BigQuery 데이터 뷰어 (
roles/bigquery.dataViewer) - 데이터 읽기 - BigQuery 작업 사용자 (
roles/bigquery.jobUser) - 쿼리 실행 - BigQuery 데이터 편집자 (
roles/bigquery.dataEditor) - 데이터 쓰기
3.3 JSON 키 생성
- 생성된 서비스 계정을 클릭합니다.
- “키” 탭으로 이동합니다.
- **“키 추가” > “새 키 만들기”**를 클릭합니다.
- JSON 형식을 선택하고 **“만들기”**를 클릭합니다.
- JSON 파일이 자동으로 다운로드됩니다.
⚠️
보안 주의
JSON 키 파일은 절대 Git에 커밋하거나 공개적으로 공유하지 마세요. .gitignore에 추가하고, 안전한 위치에 보관하세요.
4. 데이터셋 및 테이블 생성
이 Cookbook의 모든 예제는 아래 SQL 스크립트로 생성된 데이터를 사용합니다. BigQuery 콘솔에서 전체 스크립트를 실행하세요.
💡
프로젝트 ID 변경
아래 SQL에서 your-project-id를 실제 프로젝트 ID로 변경하세요.
4.1 데이터셋 생성
-- 데이터셋 생성 (US 멀티리전)
CREATE SCHEMA IF NOT EXISTS `your-project-id.retail_analytics_us`
OPTIONS(location="US");4.2 원본 데이터 복제 (thelook_ecommerce)
BigQuery 공개 데이터셋 thelook_ecommerce를 복제합니다. 파티션과 클러스터링을 적용하여 쿼리 성능을 최적화합니다.
-- 1) 제품 테이블 (파티션 없음 - 날짜 컬럼 없음)
CREATE OR REPLACE TABLE `your-project-id.retail_analytics_us.src_products` AS
SELECT
id AS product_id,
category,
brand,
department,
name,
retail_price,
cost,
sku,
distribution_center_id
FROM `bigquery-public-data.thelook_ecommerce.products`;
-- 2) 주문 테이블 (created_at 파티션, user_id 클러스터)
CREATE OR REPLACE TABLE `your-project-id.retail_analytics_us.src_orders`
PARTITION BY DATE(created_at)
CLUSTER BY user_id AS
SELECT
order_id,
user_id,
created_at,
status,
num_of_item
FROM `bigquery-public-data.thelook_ecommerce.orders`;
-- 3) 주문상품 테이블 (order_id, product_id 클러스터)
CREATE OR REPLACE TABLE `your-project-id.retail_analytics_us.src_order_items`
CLUSTER BY order_id, product_id AS
SELECT
order_id,
product_id,
sale_price,
returned_at,
shipped_at,
delivered_at
FROM `bigquery-public-data.thelook_ecommerce.order_items`;
-- 4) 이벤트 테이블 (created_at 파티션, session_id/user_id 클러스터)
CREATE OR REPLACE TABLE `your-project-id.retail_analytics_us.src_events`
PARTITION BY DATE(created_at)
CLUSTER BY session_id, user_id AS
SELECT
id,
user_id,
sequence_number,
session_id,
created_at,
ip_address
FROM `bigquery-public-data.thelook_ecommerce.events`;
-- 5) 사용자 테이블 (created_at 파티션, country 클러스터)
CREATE OR REPLACE TABLE `your-project-id.retail_analytics_us.src_users`
PARTITION BY DATE(created_at)
CLUSTER BY country AS
SELECT
id AS user_id,
first_name,
last_name,
email,
gender,
age,
state,
country,
created_at
FROM `bigquery-public-data.thelook_ecommerce.users`;4.3 이벤트 증강 테이블 (세션별 채널/디바이스)
마케팅 분석을 위한 세션 단위 증강 데이터를 생성합니다.
-- 이벤트 증강 (세션 단위 채널/디바이스/랜딩 생성)
CREATE OR REPLACE TABLE `your-project-id.retail_analytics_us.events_augmented`
PARTITION BY session_date
CLUSTER BY channel_key, device_key AS
WITH base AS (
SELECT
e.session_id,
ANY_VALUE(e.user_id) AS user_id,
MIN(e.created_at) AS session_start_at,
DATE(MIN(e.created_at)) AS session_date,
COUNT(*) AS events_in_session,
ABS(MOD(FARM_FINGERPRINT(CAST(e.session_id AS STRING)), 1000000)) AS h
FROM `your-project-id.retail_analytics_us.src_events` e
GROUP BY e.session_id
),
cooked AS (
SELECT
b.session_id,
b.user_id,
b.session_start_at,
b.session_date,
b.events_in_session,
b.h,
CASE MOD(b.h, 7)
WHEN 0 THEN 'organic' WHEN 1 THEN 'paid_search' WHEN 2 THEN 'paid_social'
WHEN 3 THEN 'display' WHEN 4 THEN 'email' WHEN 5 THEN 'referral' ELSE 'direct'
END AS channel_key,
CASE MOD(DIV(b.h,7), 3)
WHEN 0 THEN 'desktop' WHEN 1 THEN 'mobile' ELSE 'tablet'
END AS device_key,
CASE MOD(DIV(b.h,21), 5)
WHEN 0 THEN 'https://shop.example.com/home'
WHEN 1 THEN 'https://shop.example.com/women/heels'
WHEN 2 THEN 'https://shop.example.com/men/tee'
WHEN 3 THEN 'https://shop.example.com/accessories/belts'
ELSE 'https://shop.example.com/watches/quartz'
END AS landing_page
FROM base b
)
SELECT
c.session_id,
c.user_id,
c.session_start_at,
c.session_date,
c.events_in_session,
c.channel_key,
c.device_key,
c.landing_page,
LEAST(c.events_in_session, 10) AS pageviews_est,
CAST(LEAST(c.events_in_session, 10) * (0.15 + MOD(DIV(c.h,105), 5) * 0.05) AS INT64) AS add_to_cart_est
FROM cooked c;4.4 CS 티켓 더미 데이터 (프로젝트 1용)
고객 서비스 분석 프로젝트를 위한 티켓 데이터를 생성합니다.
-- CS 티켓 더미
CREATE OR REPLACE TABLE `your-project-id.retail_analytics_us.cs_tickets_dummy`
PARTITION BY DATE(opened_at)
CLUSTER BY country_name, category, issue_type AS
WITH user_pool AS (
SELECT user_id, country, ROW_NUMBER() OVER (ORDER BY user_id) as rn
FROM `your-project-id.retail_analytics_us.src_users`
),
user_count AS (
SELECT COUNT(*) as total_users FROM user_pool
),
ticket_numbers AS (
SELECT off
FROM UNNEST(GENERATE_ARRAY(1, 5000)) AS off
),
gen AS (
SELECT
t.off,
CONCAT('TKT_', CAST(100000+t.off AS STRING)) AS ticket_id,
u.user_id,
TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL CAST(t.off * 30 + MOD(ABS(FARM_FINGERPRINT(CONCAT('ts', CAST(t.off AS STRING)))), 90*24*60) AS INT64) MINUTE) AS opened_at,
t.off AS row_num
FROM ticket_numbers t
CROSS JOIN user_count uc
LEFT JOIN user_pool u ON u.rn = 1 + MOD(ABS(FARM_FINGERPRINT(CAST(t.off AS STRING))), uc.total_users)
)
SELECT
ticket_id,
user_id,
opened_at,
CASE WHEN MOD(ABS(FARM_FINGERPRINT(CONCAT(ticket_id, '-resp'))), 10) < 7
THEN TIMESTAMP_SUB(opened_at, INTERVAL -1 * CAST(MOD(ABS(FARM_FINGERPRINT(CONCAT(ticket_id, '-resptime'))), 48) AS INT64) HOUR)
END AS first_response_at,
CASE WHEN MOD(ABS(FARM_FINGERPRINT(CONCAT(ticket_id, '-resv'))), 10) < 6
THEN TIMESTAMP_SUB(opened_at, INTERVAL -1 * CAST(MOD(ABS(FARM_FINGERPRINT(CONCAT(ticket_id, '-resvtime'))), 120) AS INT64) HOUR)
END AS resolved_at,
CASE
WHEN MOD(ABS(FARM_FINGERPRINT(CONCAT(ticket_id, '-status'))), 100) < 15 THEN 'escalated'
WHEN MOD(ABS(FARM_FINGERPRINT(CONCAT(ticket_id, '-status'))), 100) < 25 THEN 'pending'
WHEN MOD(ABS(FARM_FINGERPRINT(CONCAT(ticket_id, '-status'))), 100) < 35 THEN 'open'
ELSE 'solved'
END AS status,
CASE
WHEN MOD(ABS(FARM_FINGERPRINT(CONCAT(ticket_id, '-priority'))), 100) < 10 THEN 'urgent'
WHEN MOD(ABS(FARM_FINGERPRINT(CONCAT(ticket_id, '-priority'))), 100) < 40 THEN 'high'
WHEN MOD(ABS(FARM_FINGERPRINT(CONCAT(ticket_id, '-priority'))), 100) < 70 THEN 'normal'
ELSE 'low'
END AS priority,
CASE
WHEN MOD(ABS(FARM_FINGERPRINT(CONCAT(ticket_id, '-channel'))), 10) < 4 THEN 'email'
WHEN MOD(ABS(FARM_FINGERPRINT(CONCAT(ticket_id, '-channel'))), 10) < 7 THEN 'chat'
ELSE 'phone'
END AS channel,
CASE
WHEN MOD(ABS(FARM_FINGERPRINT(CONCAT(ticket_id, '-issue'))), 100) < 25 THEN 'size'
WHEN MOD(ABS(FARM_FINGERPRINT(CONCAT(ticket_id, '-issue'))), 100) < 50 THEN 'quality'
WHEN MOD(ABS(FARM_FINGERPRINT(CONCAT(ticket_id, '-issue'))), 100) < 70 THEN 'shipping'
WHEN MOD(ABS(FARM_FINGERPRINT(CONCAT(ticket_id, '-issue'))), 100) < 85 THEN 'payment'
ELSE 'refund'
END AS issue_type,
CONCAT('AGT_', CAST(100 + MOD(ABS(FARM_FINGERPRINT(CONCAT(ticket_id, '-agent'))), 20) AS STRING)) AS agent_id,
CASE WHEN MOD(ABS(FARM_FINGERPRINT(CONCAT(ticket_id, '-country'))), 2) = 0 THEN 'United States' ELSE 'Korea, Republic of' END AS country_name,
CASE WHEN MOD(ABS(FARM_FINGERPRINT(CONCAT(ticket_id, '-category'))), 2) = 0 THEN 'Womens Shoes' ELSE 'Mens Apparel' END AS category,
CASE WHEN MOD(ABS(FARM_FINGERPRINT(CONCAT(ticket_id, '-sat'))), 10) < 6
THEN CAST(1 + MOD(ABS(FARM_FINGERPRINT(CONCAT(ticket_id, '-satval'))), 5) AS INT64)
END AS satisfaction_score,
CASE WHEN MOD(row_num, 2) = 0 THEN 'Thanks for quick support' ELSE 'I want to return due to size issue' END AS comment
FROM gen;4.5 만족도/NPS 설문 더미 데이터
-- 만족도/NPS 설문 더미
CREATE OR REPLACE TABLE `your-project-id.retail_analytics_us.survey_cs_dummy`
PARTITION BY DATE(sent_at)
CLUSTER BY country_name, category AS
WITH tickets_with_index AS (
SELECT
ticket_id,
user_id,
ROW_NUMBER() OVER (ORDER BY ticket_id) as rn
FROM `your-project-id.retail_analytics_us.cs_tickets_dummy`
WHERE status IN ('solved', 'pending')
),
ticket_count AS (
SELECT COUNT(*) as total_tickets FROM tickets_with_index
),
survey_numbers AS (
SELECT off FROM UNNEST(GENERATE_ARRAY(1, 1500)) AS off
)
SELECT
CONCAT('SVY_', CAST(100000+s.off AS STRING)) AS survey_id,
t.user_id,
TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL CAST(s.off * 40 + MOD(ABS(FARM_FINGERPRINT(CONCAT('svy', CAST(s.off AS STRING)))), 60*24*60) AS INT64) MINUTE) AS sent_at,
CASE WHEN MOD(ABS(FARM_FINGERPRINT(CONCAT('SVY_', CAST(s.off AS STRING), '-complete'))), 10) < 6
THEN TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL CAST(s.off * 40 + MOD(ABS(FARM_FINGERPRINT(CONCAT('svy-c', CAST(s.off AS STRING)))), 60*24*55) AS INT64) MINUTE)
END AS completed_at,
CASE WHEN MOD(ABS(FARM_FINGERPRINT(CONCAT('SVY_', CAST(s.off AS STRING), '-nps'))), 10) < 6
THEN CAST(MOD(ABS(FARM_FINGERPRINT(CONCAT('SVY_', CAST(s.off AS STRING), '-npsval'))), 11) AS INT64)
END AS nps_score,
CASE WHEN MOD(ABS(FARM_FINGERPRINT(CONCAT('SVY_', CAST(s.off AS STRING), '-csat'))), 10) < 7
THEN CAST(1 + MOD(ABS(FARM_FINGERPRINT(CONCAT('SVY_', CAST(s.off AS STRING), '-csatval'))), 5) AS INT64)
END AS csat_score,
CASE WHEN MOD(s.off, 2) = 0 THEN 'Good quality' ELSE 'Delivery took longer than expected' END AS free_text,
CASE WHEN MOD(ABS(FARM_FINGERPRINT(CONCAT('SVY_', CAST(s.off AS STRING), '-hasticket'))), 10) < 4
THEN t.ticket_id
END AS related_ticket_id,
CASE WHEN MOD(ABS(FARM_FINGERPRINT(CONCAT('SVY_', CAST(s.off AS STRING), '-country'))), 2) = 0 THEN 'United States' ELSE 'Korea, Republic of' END AS country_name,
CASE WHEN MOD(ABS(FARM_FINGERPRINT(CONCAT('SVY_', CAST(s.off AS STRING), '-cat'))), 2) = 0 THEN 'Womens Shoes' ELSE 'Mens Apparel' END AS category
FROM survey_numbers s
CROSS JOIN ticket_count tc
LEFT JOIN tickets_with_index t ON t.rn = 1 + MOD(ABS(FARM_FINGERPRINT(CAST(s.off AS STRING))), tc.total_tickets);4.6 반품 사유 더미 데이터
-- 반품 사유 라벨 더미
CREATE OR REPLACE TABLE `your-project-id.retail_analytics_us.returns_reason_dummy`
PARTITION BY return_date
CLUSTER BY reason_code, country_name AS
WITH order_items_sample AS (
SELECT
oi.order_id,
oi.product_id,
o.created_at,
u.country as country_name,
ROW_NUMBER() OVER (ORDER BY oi.order_id, oi.product_id) as row_num
FROM `your-project-id.retail_analytics_us.src_order_items` oi
INNER JOIN `your-project-id.retail_analytics_us.src_orders` o ON oi.order_id = o.order_id
INNER JOIN `your-project-id.retail_analytics_us.src_users` u ON o.user_id = u.user_id
WHERE RAND() < 0.1
AND oi.returned_at IS NULL
),
reasons AS (
SELECT * FROM UNNEST(['size_issue','defect','damaged','changed_mind','shipping_delay']) AS reason_code WITH OFFSET AS reason_offset
),
responsibilities AS (
SELECT * FROM UNNEST(['merchant','logistics','customer']) AS responsibility WITH OFFSET AS resp_offset
)
SELECT
order_id,
product_id,
DATE_ADD(DATE(created_at), INTERVAL CAST(7 + MOD(row_num, 30) AS INT64) DAY) AS return_date,
(SELECT reason_code FROM reasons WHERE reason_offset = MOD(ABS(FARM_FINGERPRINT(CONCAT(CAST(order_id AS STRING), '-', CAST(product_id AS STRING), '-reason'))), 5) LIMIT 1) AS reason_code,
'auto-generated reason' AS reason_text,
(SELECT responsibility FROM responsibilities WHERE resp_offset = MOD(ABS(FARM_FINGERPRINT(CONCAT(CAST(order_id AS STRING), '-', CAST(product_id AS STRING), '-resp'))), 3) LIMIT 1) AS responsibility,
country_name
FROM order_items_sample;4.7 외부/전략 신호 더미 (프로젝트 2용)
-- 수요 트렌드 (Google Trends 유사)
CREATE OR REPLACE TABLE `your-project-id.retail_analytics_us.ext_demand_trends`
PARTITION BY week
CLUSTER BY keyword, region_code AS
WITH weeks AS (
SELECT DATE_TRUNC(DATE_SUB(CURRENT_DATE(), INTERVAL off WEEK), WEEK(MONDAY)) AS week
FROM UNNEST(GENERATE_ARRAY(0, 26)) AS off
),
kw AS (SELECT * FROM UNNEST(['heels','women shoes','men clothing','handbags','watches']) AS keyword),
rg AS (SELECT * FROM UNNEST(['US','KR']) AS region_code)
SELECT week, keyword, region_code, 50 + CAST(FLOOR(30*RAND()) AS INT64) AS score
FROM weeks CROSS JOIN kw CROSS JOIN rg;
-- 경쟁/기술 신호
CREATE OR REPLACE TABLE `your-project-id.retail_analytics_us.ext_competitive_signal`
PARTITION BY period
CLUSTER BY topic, region_code AS
WITH months AS (
SELECT DATE_TRUNC(DATE_SUB(CURRENT_DATE(), INTERVAL off MONTH), MONTH) AS period
FROM UNNEST(GENERATE_ARRAY(0, 12)) AS off
),
topics AS (SELECT * FROM UNNEST(['fabric','eco','sportswear','accessories']) AS topic),
regions AS (SELECT * FROM UNNEST(['US','KR']) AS region_code)
SELECT
m.period AS period,
t.topic AS topic,
r.region_code AS region_code,
ROUND(10 + RAND()*90, 2) AS signal,
(SELECT AS VALUE s FROM UNNEST(['patent','news','social']) s ORDER BY RAND() LIMIT 1) AS source
FROM months m
CROSS JOIN topics t
CROSS JOIN regions r;
-- 거시/날씨 보정
CREATE OR REPLACE TABLE `your-project-id.retail_analytics_us.ext_macro_weather`
PARTITION BY period
CLUSTER BY country_iso AS
WITH months AS (
SELECT DATE_TRUNC(DATE_SUB(CURRENT_DATE(), INTERVAL off MONTH), MONTH) AS period
FROM UNNEST(GENERATE_ARRAY(0, 12)) AS off
),
countries AS (SELECT * FROM UNNEST(['US','KR']) AS country_iso)
SELECT
m.period AS period,
c.country_iso AS country_iso,
ROUND(3 + RAND()*7, 2) AS unemployment_rate,
ROUND(80 + RAND()*40, 2) AS income_index,
ROUND(40 + RAND()*60, 2) AS season_temp_idx
FROM months m
CROSS JOIN countries c;4.8 마케팅 캠페인 더미 (프로젝트 3용)
-- 캠페인 메타
CREATE OR REPLACE TABLE `your-project-id.retail_analytics_us.mkt_campaigns_dummy`
PARTITION BY start_date
CLUSTER BY channel_key, target_category AS
WITH ids AS (
SELECT CONCAT('CMP_', CAST(1000+off AS STRING)) AS campaign_id
FROM UNNEST(GENERATE_ARRAY(1, 20)) AS off
)
SELECT
campaign_id,
CONCAT('Campaign ', CAST(ROW_NUMBER() OVER() AS STRING)) AS campaign_name,
(SELECT AS VALUE c FROM UNNEST(['paid_search','paid_social','display','email']) c ORDER BY RAND() LIMIT 1) AS channel_key,
DATE_SUB(CURRENT_DATE(), INTERVAL CAST(RAND()*90 AS INT64) DAY) AS start_date,
DATE_ADD(CURRENT_DATE(), INTERVAL CAST(RAND()*30 AS INT64) DAY) AS end_date,
(SELECT AS VALUE cat FROM UNNEST(['Womens Shoes','Mens Apparel','Accessories','Watches','Bags']) cat ORDER BY RAND() LIMIT 1) AS target_category,
(SELECT AS VALUE cn FROM UNNEST(['United States','Korea, Republic of']) cn ORDER BY RAND() LIMIT 1) AS target_country,
(SELECT AS VALUE o FROM UNNEST(['awareness','traffic','conversion','retention']) o ORDER BY RAND() LIMIT 1) AS objective,
(SELECT AS VALUE k FROM UNNEST(['CTR','CVR','ROAS','LTV']) k ORDER BY RAND() LIMIT 1) AS kpi,
ROUND(1000 + RAND()*9000, 2) AS budget
FROM ids;
-- 채널별 집행/성과
CREATE OR REPLACE TABLE `your-project-id.retail_analytics_us.mkt_channel_spend_dummy`
PARTITION BY date
CLUSTER BY channel_key, campaign_id, country_name AS
WITH cal AS (
SELECT DATE_SUB(CURRENT_DATE(), INTERVAL off DAY) AS date
FROM UNNEST(GENERATE_ARRAY(0, 90)) AS off
),
ch AS (SELECT * FROM UNNEST(['paid_search','paid_social','display','email','referral','direct']) AS channel_key),
co AS (SELECT * FROM UNNEST(['United States','Korea, Republic of']) AS country_name),
dv AS (SELECT * FROM UNNEST(['desktop','mobile']) AS device_key)
SELECT
date, channel_key,
(SELECT campaign_id FROM `your-project-id.retail_analytics_us.mkt_campaigns_dummy` ORDER BY RAND() LIMIT 1) AS campaign_id,
CAST(1000 + RAND()*9000 AS INT64) AS impressions,
CAST(100 + RAND()*2000 AS INT64) AS clicks,
ROUND(100 + RAND()*800, 2) AS spend,
CONCAT('https://shop.example.com/lp/', channel_key) AS landing_page,
country_name, device_key
FROM cal, ch, co, dv;5. 생성 결과 확인
모든 테이블이 정상적으로 생성되었는지 확인합니다.
-- 테이블 목록 확인
SELECT table_name, table_type, row_count
FROM `your-project-id.retail_analytics_us`.INFORMATION_SCHEMA.TABLES
ORDER BY table_name;예상되는 테이블 목록:
| 테이블명 | 설명 |
|---|---|
src_products | 상품 마스터 |
src_orders | 주문 헤더 |
src_order_items | 주문 상세 |
src_events | 웹 이벤트 로그 |
src_users | 고객 정보 |
events_augmented | 세션별 증강 데이터 |
cs_tickets_dummy | CS 티켓 |
survey_cs_dummy | 만족도 설문 |
returns_reason_dummy | 반품 사유 |
ext_demand_trends | 수요 트렌드 |
ext_competitive_signal | 경쟁 신호 |
ext_macro_weather | 거시경제/날씨 |
mkt_campaigns_dummy | 마케팅 캠페인 |
mkt_channel_spend_dummy | 채널별 집행 |
6. Python 환경에서 BigQuery 연결
6.1 필수 패키지 설치
pip install google-cloud-bigquery pandas db-dtypes pyarrow6.2 인증 설정
import os
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "/path/to/your-service-account-key.json"6.3 연결 테스트
from google.cloud import bigquery
import pandas as pd
# 클라이언트 초기화
client = bigquery.Client(project='your-project-id')
# 테스트 쿼리 실행
query = """
SELECT
COUNT(*) as total_orders,
COUNT(DISTINCT user_id) as unique_customers,
SUM(num_of_item) as total_items
FROM `your-project-id.retail_analytics_us.src_orders`
"""
result = client.query(query).to_dataframe()
print("✅ BigQuery 연결 성공!")
print(result)예상 출력:
✅ BigQuery 연결 성공!
total_orders unique_customers total_items
0 80643 49891 1588467. 환경 변수 설정 (권장)
macOS / Linux
# ~/.bashrc 또는 ~/.zshrc에 추가
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/your-service-account-key.json"
export GCP_PROJECT_ID="your-project-id"Windows
setx GOOGLE_APPLICATION_CREDENTIALS "C:\path\to\your-service-account-key.json"
setx GCP_PROJECT_ID "your-project-id"다음 단계
환경 설정이 완료되었습니다!
Last updated on