Pandas Setup
This guide walks you through setting up a local data analysis environment using Python and Pandas.
1. Install Python
Python 3.9 or higher is recommended. All examples in this Cookbook have been tested with Python 3.10.
1.1 Check Python Installation
python --version
# or
python3 --version1.2 Install Python (if not installed)
macOS (Homebrew)
brew install python@3.11Ubuntu/Debian
sudo apt update
sudo apt install python3.11 python3.11-venv python3-pipWindows
- Download the installer from the official Python websiteΒ
- Check βAdd Python to PATHβ option during installation
2. Set Up Virtual Environment
We recommend using an isolated environment for each project.
2.1 Using venv
# Create virtual environment
python -m venv cookbook-env
# Activate virtual environment
# macOS/Linux
source cookbook-env/bin/activate
# Windows
cookbook-env\Scripts\activate2.2 Using conda (optional)
# Create environment
conda create -n cookbook python=3.11
# Activate environment
conda activate cookbook3. Install Required Packages
3.1 Basic Packages
pip install pandas numpy matplotlib seaborn jupyter3.2 Additional Packages (recommended)
# Data visualization
pip install plotly altair
# Statistical analysis
pip install scipy statsmodels scikit-learn
# Time series analysis
pip install prophet
# BigQuery integration (optional)
pip install google-cloud-bigquery db-dtypes3.3 Using requirements.txt
Create a requirements.txt file in your project folder:
pandas>=2.0.0
numpy>=1.24.0
matplotlib>=3.7.0
seaborn>=0.12.0
jupyter>=1.0.0
plotly>=5.15.0
scipy>=1.10.0
statsmodels>=0.14.0
scikit-learn>=1.3.0Install:
pip install -r requirements.txt4. Download Sample Data
Download the data used in this Cookbook in CSV format.
4.1 Data Folder Structure
cookbook-data/
βββ src_orders.csv
βββ src_order_items.csv
βββ src_products.csv
βββ src_users.csv
βββ src_events.csv
βββ events_augmented.csv
βββ cs_tickets_dummy.csv
βββ mkt_campaigns_dummy.csv4.2 Export CSV from BigQuery (if you have a BigQuery account)
from google.cloud import bigquery
import pandas as pd
client = bigquery.Client(project='your-project-id')
tables = [
'src_orders', 'src_order_items', 'src_products',
'src_users', 'src_events', 'events_augmented',
'cs_tickets_dummy', 'mkt_campaigns_dummy'
]
for table in tables:
query = f"SELECT * FROM `your-project-id.retail_analytics.{table}`"
df = client.query(query).to_dataframe()
df.to_csv(f'cookbook-data/{table}.csv', index=False)
print(f"β
{table}.csv saved ({len(df):,} rows)")4.3 Download Sample Data Directly
No BigQuery account? No problem! You can download the complete dataset for free after signing up.
5. Jupyter Notebook Setup
5.1 Launch Jupyter
jupyter notebook
# or
jupyter lab5.2 Basic Setup Template
Add the following code to the first cell of a new notebook:
# Import basic libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime, timedelta
import warnings
warnings.filterwarnings('ignore')
# Set visualization style
plt.style.use('seaborn-v0_8-whitegrid')
sns.set_palette("husl")
plt.rcParams['figure.figsize'] = (12, 6)
plt.rcParams['font.size'] = 10
# Korean font settings (optional)
# macOS
# plt.rcParams['font.family'] = 'AppleGothic'
# Windows
# plt.rcParams['font.family'] = 'Malgun Gothic'
# Linux
# plt.rcParams['font.family'] = 'NanumGothic'
plt.rcParams['axes.unicode_minus'] = False
# Pandas display options
pd.set_option('display.max_columns', 50)
pd.set_option('display.max_rows', 100)
pd.set_option('display.width', 200)
pd.set_option('display.float_format', '{:.2f}'.format)
# Set data path
DATA_PATH = './cookbook-data/'
print("β
Environment setup complete!")5.3 Data Loading Function
Define a frequently used data loading function:
def load_data(table_name):
"""Load data and perform basic preprocessing"""
df = pd.read_csv(f'{DATA_PATH}{table_name}.csv')
# Auto-convert date columns
date_columns = [col for col in df.columns if 'date' in col.lower() or 'at' in col.lower()]
for col in date_columns:
try:
df[col] = pd.to_datetime(df[col])
except:
pass
return df
# Usage example
orders = load_data('src_orders')
print(f"Orders data: {len(orders):,} rows x {len(orders.columns)} columns")6. Verify Setup
A script to verify that all settings are complete:
import sys
import importlib
def check_environment():
print("=" * 50)
print("Environment Setup Check")
print("=" * 50)
# Python version
print(f"\nπ Python version: {sys.version}")
# Check required packages
packages = [
'pandas', 'numpy', 'matplotlib', 'seaborn',
'scipy', 'sklearn', 'plotly'
]
print("\nπ¦ Package versions:")
for pkg in packages:
try:
module = importlib.import_module(pkg)
version = getattr(module, '__version__', 'N/A')
print(f" β
{pkg}: {version}")
except ImportError:
print(f" β {pkg}: not installed")
# Check data files
import os
data_path = './cookbook-data/'
print(f"\nπ Data files ({data_path}):")
if os.path.exists(data_path):
files = os.listdir(data_path)
csv_files = [f for f in files if f.endswith('.csv')]
for f in csv_files:
size = os.path.getsize(os.path.join(data_path, f)) / 1024 / 1024
print(f" β
{f} ({size:.2f} MB)")
if not csv_files:
print(" β οΈ No CSV files found.")
else:
print(f" β Folder does not exist: {data_path}")
print("\n" + "=" * 50)
print("π Environment setup check complete!")
print("=" * 50)
check_environment()Expected output:
==================================================
Environment Setup Check
==================================================
π Python version: 3.11.0 (...)
π¦ Package versions:
β
pandas: 2.0.3
β
numpy: 1.24.3
β
matplotlib: 3.7.2
β
seaborn: 0.12.2
β
scipy: 1.11.1
β
sklearn: 1.3.0
β
plotly: 5.15.0
π Data files (./cookbook-data/):
β
src_orders.csv (5.23 MB)
β
src_order_items.csv (8.45 MB)
β
src_products.csv (1.12 MB)
β
src_users.csv (3.67 MB)
β
src_events.csv (12.34 MB)
==================================================
π Environment setup check complete!
==================================================7. VS Code Setup (Optional)
When using Jupyter Notebook in VS Code:
7.1 Install Extensions
- Python (Microsoft)
- Jupyter (Microsoft)
- Pylance (Microsoft)
7.2 settings.json Configuration
{
"python.defaultInterpreterPath": "./cookbook-env/bin/python",
"jupyter.notebookFileRoot": "${workspaceFolder}",
"python.analysis.typeCheckingMode": "basic"
}Next Steps
Your environment setup is complete! Now visit the Understanding Data Structure page to learn about the data youβll be using, or go directly to the Pandas Track to start your first recipe.