Pandas 환경 설정

이 가이드에서는 Python과 Pandas를 사용하여 로컬 데이터 분석 환경을 설정하는 방법을 안내합니다.

1. Python 설치

ℹ️

권장 버전

Python 3.9 이상을 권장합니다. 이 Cookbook의 모든 예제는 Python 3.10에서 테스트되었습니다.

1.1 Python 설치 확인


python --version
# 또는
python3 --version

1.2 Python 설치 (미설치 시)

macOS (Homebrew)


brew install python@3.11

Ubuntu/Debian


sudo apt update
sudo apt install python3.11 python3.11-venv python3-pip

Windows

Python 공식 사이트 에서 설치 파일 다운로드
설치 시 “Add Python to PATH” 옵션 체크

2. 가상환경 설정

프로젝트별로 독립된 환경을 사용하는 것을 권장합니다.

2.1 venv 사용


# 가상환경 생성
python -m venv cookbook-env
 
# 가상환경 활성화
# macOS/Linux
source cookbook-env/bin/activate
 
# Windows
cookbook-env\Scripts\activate

2.2 conda 사용 (선택사항)


# 환경 생성
conda create -n cookbook python=3.11
 
# 환경 활성화
conda activate cookbook

3. 필수 패키지 설치

3.1 기본 패키지


pip install pandas numpy matplotlib seaborn jupyter

3.2 추가 패키지 (권장)


# 데이터 시각화
pip install plotly altair
 
# 통계 분석
pip install scipy statsmodels scikit-learn
 
# 시계열 분석
pip install prophet
 
# BigQuery 연동 (선택사항)
pip install google-cloud-bigquery db-dtypes

3.3 requirements.txt 사용

프로젝트 폴더에 requirements.txt 파일을 생성합니다:


pandas>=2.0.0
numpy>=1.24.0
matplotlib>=3.7.0
seaborn>=0.12.0
jupyter>=1.0.0
plotly>=5.15.0
scipy>=1.10.0
statsmodels>=0.14.0
scikit-learn>=1.3.0

설치:


pip install -r requirements.txt

4. 샘플 데이터 다운로드

이 Cookbook에서 사용하는 데이터를 CSV 형식으로 다운로드합니다.

4.1 데이터 폴더 구조


cookbook-data/
├── src_orders.csv
├── src_order_items.csv
├── src_products.csv
├── src_users.csv
├── src_events.csv
├── events_augmented.csv
├── cs_tickets_dummy.csv
└── mkt_campaigns_dummy.csv

4.2 BigQuery에서 CSV 추출 (BigQuery 계정이 있는 경우)


from google.cloud import bigquery
import pandas as pd
 
client = bigquery.Client(project='your-project-id')
 
tables = [
    'src_orders', 'src_order_items', 'src_products',
    'src_users', 'src_events', 'events_augmented',
    'cs_tickets_dummy', 'mkt_campaigns_dummy'
]
 
for table in tables:
    query = f"SELECT * FROM `your-project-id.retail_analytics.{table}`"
    df = client.query(query).to_dataframe()
    df.to_csv(f'cookbook-data/{table}.csv', index=False)
    print(f"✅ {table}.csv 저장 완료 ({len(df):,}행)")

4.3 샘플 데이터 직접 다운로드

BigQuery 계정이 없어도 괜찮습니다! 회원가입 후 무료로 전체 데이터셋을 다운로드할 수 있습니다.

📦 샘플 데이터 다운로드 (무료)

💎 Premium 번들 받기 ($29)

5. Jupyter Notebook 설정

5.1 Jupyter 실행


jupyter notebook
# 또는
jupyter lab

5.2 기본 설정 템플릿

새 노트북의 첫 번째 셀에 다음 코드를 추가합니다:


# 기본 라이브러리 import
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime, timedelta
import warnings
warnings.filterwarnings('ignore')
 
# 시각화 스타일 설정
plt.style.use('seaborn-v0_8-whitegrid')
sns.set_palette("husl")
plt.rcParams['figure.figsize'] = (12, 6)
plt.rcParams['font.size'] = 10
 
# 한글 폰트 설정 (선택사항)
# macOS
# plt.rcParams['font.family'] = 'AppleGothic'
# Windows
# plt.rcParams['font.family'] = 'Malgun Gothic'
# Linux
# plt.rcParams['font.family'] = 'NanumGothic'
 
plt.rcParams['axes.unicode_minus'] = False
 
# Pandas 출력 옵션
pd.set_option('display.max_columns', 50)
pd.set_option('display.max_rows', 100)
pd.set_option('display.width', 200)
pd.set_option('display.float_format', '{:.2f}'.format)
 
# 데이터 경로 설정
DATA_PATH = './cookbook-data/'
 
print("✅ 환경 설정 완료!")

5.3 데이터 로드 함수

자주 사용하는 데이터 로드 함수를 정의합니다:


def load_data(table_name):
    """데이터 로드 및 기본 전처리"""
    df = pd.read_csv(f'{DATA_PATH}{table_name}.csv')
 
    # 날짜 컬럼 자동 변환
    date_columns = [col for col in df.columns if 'date' in col.lower() or 'at' in col.lower()]
    for col in date_columns:
        try:
            df[col] = pd.to_datetime(df[col])
        except:
            pass
 
    return df
 
# 사용 예시
orders = load_data('src_orders')
print(f"주문 데이터: {len(orders):,}행 x {len(orders.columns)}열")

6. 설정 확인

모든 설정이 완료되었는지 확인하는 스크립트입니다:


import sys
import importlib
 
def check_environment():
    print("=" * 50)
    print("환경 설정 확인")
    print("=" * 50)
 
    # Python 버전
    print(f"\n🐍 Python 버전: {sys.version}")
 
    # 필수 패키지 확인
    packages = [
        'pandas', 'numpy', 'matplotlib', 'seaborn',
        'scipy', 'sklearn', 'plotly'
    ]
 
    print("\n📦 패키지 버전:")
    for pkg in packages:
        try:
            module = importlib.import_module(pkg)
            version = getattr(module, '__version__', 'N/A')
            print(f"   ✅ {pkg}: {version}")
        except ImportError:
            print(f"   ❌ {pkg}: 미설치")
 
    # 데이터 파일 확인
    import os
    data_path = './cookbook-data/'
 
    print(f"\n📁 데이터 파일 ({data_path}):")
    if os.path.exists(data_path):
        files = os.listdir(data_path)
        csv_files = [f for f in files if f.endswith('.csv')]
        for f in csv_files:
            size = os.path.getsize(os.path.join(data_path, f)) / 1024 / 1024
            print(f"   ✅ {f} ({size:.2f} MB)")
        if not csv_files:
            print("   ⚠️ CSV 파일이 없습니다.")
    else:
        print(f"   ❌ 폴더가 없습니다: {data_path}")
 
    print("\n" + "=" * 50)
    print("🎉 환경 설정 확인 완료!")
    print("=" * 50)
 
check_environment()

예상 출력:


==================================================
환경 설정 확인
==================================================

🐍 Python 버전: 3.11.0 (...)

📦 패키지 버전:
   ✅ pandas: 2.0.3
   ✅ numpy: 1.24.3
   ✅ matplotlib: 3.7.2
   ✅ seaborn: 0.12.2
   ✅ scipy: 1.11.1
   ✅ sklearn: 1.3.0
   ✅ plotly: 5.15.0

📁 데이터 파일 (./cookbook-data/):
   ✅ src_orders.csv (5.23 MB)
   ✅ src_order_items.csv (8.45 MB)
   ✅ src_products.csv (1.12 MB)
   ✅ src_users.csv (3.67 MB)
   ✅ src_events.csv (12.34 MB)

==================================================
🎉 환경 설정 확인 완료!
==================================================

7. VS Code 설정 (선택사항)

VS Code에서 Jupyter Notebook을 사용하는 경우:

7.1 확장 프로그램 설치

Python (Microsoft)
Jupyter (Microsoft)
Pylance (Microsoft)

7.2 settings.json 설정


{
    "python.defaultInterpreterPath": "./cookbook-env/bin/python",
    "jupyter.notebookFileRoot": "${workspaceFolder}",
    "python.analysis.typeCheckingMode": "basic"
}

다음 단계

환경 설정이 완료되었습니다! 이제 데이터 구조 이해 페이지에서 사용할 데이터에 대해 알아보거나, 바로 Pandas 트랙으로 이동하여 첫 번째 레시피를 시작하세요.