Skip to Content
Getting StartedPandas Setup

Pandas Setup

This guide walks you through setting up a local data analysis environment using Python and Pandas.

1. Install Python

ℹ️
Recommended Version

Python 3.9 or higher is recommended. All examples in this Cookbook have been tested with Python 3.10.

1.1 Check Python Installation

python --version # or python3 --version

1.2 Install Python (if not installed)

macOS (Homebrew)

brew install python@3.11

Ubuntu/Debian

sudo apt update sudo apt install python3.11 python3.11-venv python3-pip

Windows

2. Set Up Virtual Environment

We recommend using an isolated environment for each project.

2.1 Using venv

# Create virtual environment python -m venv cookbook-env # Activate virtual environment # macOS/Linux source cookbook-env/bin/activate # Windows cookbook-env\Scripts\activate

2.2 Using conda (optional)

# Create environment conda create -n cookbook python=3.11 # Activate environment conda activate cookbook

3. Install Required Packages

3.1 Basic Packages

pip install pandas numpy matplotlib seaborn jupyter
# Data visualization pip install plotly altair # Statistical analysis pip install scipy statsmodels scikit-learn # Time series analysis pip install prophet # BigQuery integration (optional) pip install google-cloud-bigquery db-dtypes

3.3 Using requirements.txt

Create a requirements.txt file in your project folder:

pandas>=2.0.0 numpy>=1.24.0 matplotlib>=3.7.0 seaborn>=0.12.0 jupyter>=1.0.0 plotly>=5.15.0 scipy>=1.10.0 statsmodels>=0.14.0 scikit-learn>=1.3.0

Install:

pip install -r requirements.txt

4. Download Sample Data

Download the data used in this Cookbook in CSV format.

4.1 Data Folder Structure

cookbook-data/ β”œβ”€β”€ src_orders.csv β”œβ”€β”€ src_order_items.csv β”œβ”€β”€ src_products.csv β”œβ”€β”€ src_users.csv β”œβ”€β”€ src_events.csv β”œβ”€β”€ events_augmented.csv β”œβ”€β”€ cs_tickets_dummy.csv └── mkt_campaigns_dummy.csv

4.2 Export CSV from BigQuery (if you have a BigQuery account)

from google.cloud import bigquery import pandas as pd client = bigquery.Client(project='your-project-id') tables = [ 'src_orders', 'src_order_items', 'src_products', 'src_users', 'src_events', 'events_augmented', 'cs_tickets_dummy', 'mkt_campaigns_dummy' ] for table in tables: query = f"SELECT * FROM `your-project-id.retail_analytics.{table}`" df = client.query(query).to_dataframe() df.to_csv(f'cookbook-data/{table}.csv', index=False) print(f"βœ… {table}.csv saved ({len(df):,} rows)")

4.3 Download Sample Data Directly

No BigQuery account? No problem! You can download the complete dataset for free after signing up.

5. Jupyter Notebook Setup

5.1 Launch Jupyter

jupyter notebook # or jupyter lab

5.2 Basic Setup Template

Add the following code to the first cell of a new notebook:

# Import basic libraries import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns from datetime import datetime, timedelta import warnings warnings.filterwarnings('ignore') # Set visualization style plt.style.use('seaborn-v0_8-whitegrid') sns.set_palette("husl") plt.rcParams['figure.figsize'] = (12, 6) plt.rcParams['font.size'] = 10 # Korean font settings (optional) # macOS # plt.rcParams['font.family'] = 'AppleGothic' # Windows # plt.rcParams['font.family'] = 'Malgun Gothic' # Linux # plt.rcParams['font.family'] = 'NanumGothic' plt.rcParams['axes.unicode_minus'] = False # Pandas display options pd.set_option('display.max_columns', 50) pd.set_option('display.max_rows', 100) pd.set_option('display.width', 200) pd.set_option('display.float_format', '{:.2f}'.format) # Set data path DATA_PATH = './cookbook-data/' print("βœ… Environment setup complete!")

5.3 Data Loading Function

Define a frequently used data loading function:

def load_data(table_name): """Load data and perform basic preprocessing""" df = pd.read_csv(f'{DATA_PATH}{table_name}.csv') # Auto-convert date columns date_columns = [col for col in df.columns if 'date' in col.lower() or 'at' in col.lower()] for col in date_columns: try: df[col] = pd.to_datetime(df[col]) except: pass return df # Usage example orders = load_data('src_orders') print(f"Orders data: {len(orders):,} rows x {len(orders.columns)} columns")

6. Verify Setup

A script to verify that all settings are complete:

import sys import importlib def check_environment(): print("=" * 50) print("Environment Setup Check") print("=" * 50) # Python version print(f"\n🐍 Python version: {sys.version}") # Check required packages packages = [ 'pandas', 'numpy', 'matplotlib', 'seaborn', 'scipy', 'sklearn', 'plotly' ] print("\nπŸ“¦ Package versions:") for pkg in packages: try: module = importlib.import_module(pkg) version = getattr(module, '__version__', 'N/A') print(f" βœ… {pkg}: {version}") except ImportError: print(f" ❌ {pkg}: not installed") # Check data files import os data_path = './cookbook-data/' print(f"\nπŸ“ Data files ({data_path}):") if os.path.exists(data_path): files = os.listdir(data_path) csv_files = [f for f in files if f.endswith('.csv')] for f in csv_files: size = os.path.getsize(os.path.join(data_path, f)) / 1024 / 1024 print(f" βœ… {f} ({size:.2f} MB)") if not csv_files: print(" ⚠️ No CSV files found.") else: print(f" ❌ Folder does not exist: {data_path}") print("\n" + "=" * 50) print("πŸŽ‰ Environment setup check complete!") print("=" * 50) check_environment()

Expected output:

================================================== Environment Setup Check ================================================== 🐍 Python version: 3.11.0 (...) πŸ“¦ Package versions: βœ… pandas: 2.0.3 βœ… numpy: 1.24.3 βœ… matplotlib: 3.7.2 βœ… seaborn: 0.12.2 βœ… scipy: 1.11.1 βœ… sklearn: 1.3.0 βœ… plotly: 5.15.0 πŸ“ Data files (./cookbook-data/): βœ… src_orders.csv (5.23 MB) βœ… src_order_items.csv (8.45 MB) βœ… src_products.csv (1.12 MB) βœ… src_users.csv (3.67 MB) βœ… src_events.csv (12.34 MB) ================================================== πŸŽ‰ Environment setup check complete! ==================================================

7. VS Code Setup (Optional)

When using Jupyter Notebook in VS Code:

7.1 Install Extensions

  • Python (Microsoft)
  • Jupyter (Microsoft)
  • Pylance (Microsoft)

7.2 settings.json Configuration

{ "python.defaultInterpreterPath": "./cookbook-env/bin/python", "jupyter.notebookFileRoot": "${workspaceFolder}", "python.analysis.typeCheckingMode": "basic" }

Next Steps

Your environment setup is complete! Now visit the Understanding Data Structure page to learn about the data you’ll be using, or go directly to the Pandas Track to start your first recipe.

Last updated on

πŸ€–AI λͺ¨μ˜λ©΄μ ‘μ‹€μ „μ²˜λŸΌ μ—°μŠ΅ν•˜κΈ°