Pandas Setup

This guide walks you through setting up a local data analysis environment using Python and Pandas.

1. Install Python

ℹ️

Recommended Version

Python 3.9 or higher is recommended. All examples in this Cookbook have been tested with Python 3.10.

1.1 Check Python Installation


python --version
# or
python3 --version

1.2 Install Python (if not installed)

macOS (Homebrew)


brew install python@3.11

Ubuntu/Debian


sudo apt update
sudo apt install python3.11 python3.11-venv python3-pip

Windows

Download the installer from the official Python website
Check “Add Python to PATH” option during installation

2. Set Up Virtual Environment

We recommend using an isolated environment for each project.

2.1 Using venv


# Create virtual environment
python -m venv cookbook-env
 
# Activate virtual environment
# macOS/Linux
source cookbook-env/bin/activate
 
# Windows
cookbook-env\Scripts\activate

2.2 Using conda (optional)


# Create environment
conda create -n cookbook python=3.11
 
# Activate environment
conda activate cookbook

3. Install Required Packages

3.1 Basic Packages


pip install pandas numpy matplotlib seaborn jupyter

3.2 Additional Packages (recommended)


# Data visualization
pip install plotly altair
 
# Statistical analysis
pip install scipy statsmodels scikit-learn
 
# Time series analysis
pip install prophet
 
# BigQuery integration (optional)
pip install google-cloud-bigquery db-dtypes

3.3 Using requirements.txt

Create a requirements.txt file in your project folder:


pandas>=2.0.0
numpy>=1.24.0
matplotlib>=3.7.0
seaborn>=0.12.0
jupyter>=1.0.0
plotly>=5.15.0
scipy>=1.10.0
statsmodels>=0.14.0
scikit-learn>=1.3.0

Install:


pip install -r requirements.txt

4. Download Sample Data

Download the data used in this Cookbook in CSV format.

4.1 Data Folder Structure


cookbook-data/
├── src_orders.csv
├── src_order_items.csv
├── src_products.csv
├── src_users.csv
├── src_events.csv
├── events_augmented.csv
├── cs_tickets_dummy.csv
└── mkt_campaigns_dummy.csv

4.2 Export CSV from BigQuery (if you have a BigQuery account)


from google.cloud import bigquery
import pandas as pd
 
client = bigquery.Client(project='your-project-id')
 
tables = [
    'src_orders', 'src_order_items', 'src_products',
    'src_users', 'src_events', 'events_augmented',
    'cs_tickets_dummy', 'mkt_campaigns_dummy'
]
 
for table in tables:
    query = f"SELECT * FROM `your-project-id.retail_analytics.{table}`"
    df = client.query(query).to_dataframe()
    df.to_csv(f'cookbook-data/{table}.csv', index=False)
    print(f"✅ {table}.csv saved ({len(df):,} rows)")

4.3 Download Sample Data Directly

No BigQuery account? No problem! You can download the complete dataset for free after signing up.

📦 Download Sample Data (Free)

5. Jupyter Notebook Setup

5.1 Launch Jupyter


jupyter notebook
# or
jupyter lab

5.2 Basic Setup Template

Add the following code to the first cell of a new notebook:


# Import basic libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime, timedelta
import warnings
warnings.filterwarnings('ignore')
 
# Set visualization style
plt.style.use('seaborn-v0_8-whitegrid')
sns.set_palette("husl")
plt.rcParams['figure.figsize'] = (12, 6)
plt.rcParams['font.size'] = 10
 
# Korean font settings (optional)
# macOS
# plt.rcParams['font.family'] = 'AppleGothic'
# Windows
# plt.rcParams['font.family'] = 'Malgun Gothic'
# Linux
# plt.rcParams['font.family'] = 'NanumGothic'
 
plt.rcParams['axes.unicode_minus'] = False
 
# Pandas display options
pd.set_option('display.max_columns', 50)
pd.set_option('display.max_rows', 100)
pd.set_option('display.width', 200)
pd.set_option('display.float_format', '{:.2f}'.format)
 
# Set data path
DATA_PATH = './cookbook-data/'
 
print("✅ Environment setup complete!")

5.3 Data Loading Function

Define a frequently used data loading function:


def load_data(table_name):
    """Load data and perform basic preprocessing"""
    df = pd.read_csv(f'{DATA_PATH}{table_name}.csv')
 
    # Auto-convert date columns
    date_columns = [col for col in df.columns if 'date' in col.lower() or 'at' in col.lower()]
    for col in date_columns:
        try:
            df[col] = pd.to_datetime(df[col])
        except:
            pass
 
    return df
 
# Usage example
orders = load_data('src_orders')
print(f"Orders data: {len(orders):,} rows x {len(orders.columns)} columns")

6. Verify Setup

A script to verify that all settings are complete:


import sys
import importlib
 
def check_environment():
    print("=" * 50)
    print("Environment Setup Check")
    print("=" * 50)
 
    # Python version
    print(f"\n🐍 Python version: {sys.version}")
 
    # Check required packages
    packages = [
        'pandas', 'numpy', 'matplotlib', 'seaborn',
        'scipy', 'sklearn', 'plotly'
    ]
 
    print("\n📦 Package versions:")
    for pkg in packages:
        try:
            module = importlib.import_module(pkg)
            version = getattr(module, '__version__', 'N/A')
            print(f"   ✅ {pkg}: {version}")
        except ImportError:
            print(f"   ❌ {pkg}: not installed")
 
    # Check data files
    import os
    data_path = './cookbook-data/'
 
    print(f"\n📁 Data files ({data_path}):")
    if os.path.exists(data_path):
        files = os.listdir(data_path)
        csv_files = [f for f in files if f.endswith('.csv')]
        for f in csv_files:
            size = os.path.getsize(os.path.join(data_path, f)) / 1024 / 1024
            print(f"   ✅ {f} ({size:.2f} MB)")
        if not csv_files:
            print("   ⚠️ No CSV files found.")
    else:
        print(f"   ❌ Folder does not exist: {data_path}")
 
    print("\n" + "=" * 50)
    print("🎉 Environment setup check complete!")
    print("=" * 50)
 
check_environment()

Expected output:


==================================================
Environment Setup Check
==================================================

🐍 Python version: 3.11.0 (...)

📦 Package versions:
   ✅ pandas: 2.0.3
   ✅ numpy: 1.24.3
   ✅ matplotlib: 3.7.2
   ✅ seaborn: 0.12.2
   ✅ scipy: 1.11.1
   ✅ sklearn: 1.3.0
   ✅ plotly: 5.15.0

📁 Data files (./cookbook-data/):
   ✅ src_orders.csv (5.23 MB)
   ✅ src_order_items.csv (8.45 MB)
   ✅ src_products.csv (1.12 MB)
   ✅ src_users.csv (3.67 MB)
   ✅ src_events.csv (12.34 MB)

==================================================
🎉 Environment setup check complete!
==================================================

7. VS Code Setup (Optional)

When using Jupyter Notebook in VS Code:

7.1 Install Extensions

Python (Microsoft)
Jupyter (Microsoft)
Pylance (Microsoft)

7.2 settings.json Configuration


{
    "python.defaultInterpreterPath": "./cookbook-env/bin/python",
    "jupyter.notebookFileRoot": "${workspaceFolder}",
    "python.analysis.typeCheckingMode": "basic"
}

Next Steps

Your environment setup is complete! Now visit the Understanding Data Structure page to learn about the data you’ll be using, or go directly to the Pandas Track to start your first recipe.