Skip to Content
ConceptsMachine LearningTime Series Forecasting

Time Series Forecasting

Advanced

Learning Objectives

After completing this recipe, you will be able to:

  • Understand the components of time series data
  • Perform time series analysis using statsmodels
  • Model trends and seasonality
  • Incorporate holiday effects
  • Visualize and interpret forecast results
  • Evaluate forecast performance

1. What is Time Series Forecasting?

Theory

Time series forecasting learns patterns from historical data to predict future values.

Components:

  • Trend: Long-term increase/decrease patterns
  • Seasonality: Periodically repeating patterns (weekly, monthly, yearly)
  • Holiday Effect: Variations due to specific events
  • Residual: Unexplained irregular variations

Business Applications

Application AreaForecast TargetBenefit
Demand ForecastingOrder quantity by productInventory optimization
Sales ForecastingMonthly/quarterly salesBudget planning
Traffic ForecastingWebsite visitorsServer capacity planning
Workforce PlanningCall center inquiriesStaff allocation

2. Time Series Data Preparation

Sample Sales Data Generation

import pandas as pd import numpy as np import matplotlib.pyplot as plt from datetime import datetime, timedelta import warnings warnings.filterwarnings('ignore') # Set seed for reproducible results np.random.seed(42) # Generate 2 years of daily sales data date_range = pd.date_range(start='2022-01-01', end='2023-12-31', freq='D') n_days = len(date_range) # Base sales (trend + noise) base_sales = 10000 + np.linspace(0, 2000, n_days) # Upward trend # Weekly seasonality (weekend sales increase) weekly_pattern = np.array([1.0, 0.95, 0.9, 0.95, 1.1, 1.3, 1.2]) weekly_seasonality = np.tile(weekly_pattern, n_days // 7 + 1)[:n_days] # Monthly seasonality (end of month sales increase) monthly_seasonality = 1 + 0.1 * np.sin(2 * np.pi * np.arange(n_days) / 30) # Yearly seasonality (Nov-Dec holiday season) yearly_seasonality = np.ones(n_days) for i, date in enumerate(date_range): if date.month in [11, 12]: yearly_seasonality[i] = 1.3 elif date.month in [1, 2]: yearly_seasonality[i] = 0.8 # Final sales daily_sales = pd.DataFrame({ 'ds': date_range, 'y': (base_sales * weekly_seasonality * monthly_seasonality * yearly_seasonality * (1 + np.random.normal(0, 0.05, n_days))) }) print(f"Data period: {daily_sales['ds'].min().date()} ~ {daily_sales['ds'].max().date()}") print(f"Total days: {len(daily_sales)}") print(f"Average daily sales: ${daily_sales['y'].mean():,.0f}") print(daily_sales.head())
실행 ź²°ź³¼
Data period: 2022-01-01 ~ 2023-12-31
Total days: 730
Average daily sales: $11,234
        ds          y
0 2022-01-01   8123.45
1 2022-01-02   7856.23
2 2022-01-03   7234.56
3 2022-01-04   7654.32
4 2022-01-05   8876.54

Time Series Visualization

# Full time series visualization fig, axes = plt.subplots(3, 1, figsize=(14, 10)) # Overall sales trend axes[0].plot(daily_sales['ds'], daily_sales['y'], linewidth=0.8) axes[0].set_title('Daily Sales Trend', fontsize=14, fontweight='bold') axes[0].set_xlabel('Date') axes[0].set_ylabel('Sales ($)') axes[0].grid(True, alpha=0.3) # Monthly aggregation monthly_sales = daily_sales.set_index('ds').resample('M').sum() axes[1].bar(monthly_sales.index, monthly_sales['y'], width=20, color='steelblue') axes[1].set_title('Monthly Sales', fontsize=14, fontweight='bold') axes[1].set_xlabel('Month') axes[1].set_ylabel('Sales ($)') axes[1].grid(True, alpha=0.3) # Day of week average daily_sales['dayofweek'] = daily_sales['ds'].dt.day_name() dow_order = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday'] dow_sales = daily_sales.groupby('dayofweek')['y'].mean().reindex(dow_order) axes[2].bar(dow_sales.index, dow_sales.values, color='coral') axes[2].set_title('Average Sales by Day of Week', fontsize=14, fontweight='bold') axes[2].set_xlabel('Day of Week') axes[2].set_ylabel('Average Sales ($)') axes[2].tick_params(axis='x', rotation=45) plt.tight_layout() plt.show()

Time Series Overview

You can identify trend (increase/decrease), seasonality (periodic patterns), and volatility from time series data.

Time Series Patterns

Weekly/monthly pattern analysis: Sales are higher on weekends, with peaks during the Nov-Dec holiday season.


3. Time Series Decomposition

Decomposition using statsmodels

from statsmodels.tsa.seasonal import seasonal_decompose # Time series decomposition (multiplicative model) ts_data = daily_sales.set_index('ds')['y'] decomposition = seasonal_decompose(ts_data, model='multiplicative', period=7) fig, axes = plt.subplots(4, 1, figsize=(14, 12)) # Original data axes[0].plot(decomposition.observed) axes[0].set_title('Observed', fontsize=12) axes[0].set_ylabel('Sales') # Trend axes[1].plot(decomposition.trend, color='orange') axes[1].set_title('Trend', fontsize=12) axes[1].set_ylabel('Sales') # Seasonality axes[2].plot(decomposition.seasonal, color='green') axes[2].set_title('Seasonal', fontsize=12) axes[2].set_ylabel('Seasonal Index') # Residual axes[3].plot(decomposition.resid, color='red') axes[3].set_title('Residual', fontsize=12) axes[3].set_ylabel('Residual') plt.tight_layout() plt.show() print("=== Time Series Decomposition Summary ===") print(f"Trend range: ${decomposition.trend.min():,.0f} ~ ${decomposition.trend.max():,.0f}") print(f"Seasonality range: {decomposition.seasonal.min():.2f} ~ {decomposition.seasonal.max():.2f}")

Time Series Decomposition

Decomposition result interpretation:

  • Trend: Overall upward trend
  • Seasonal: Periodic pattern (weekly repetition)
  • Residual: Unexplained random variation

4. Simple Moving Average Forecast

Moving Average Calculation

# Calculate moving averages daily_sales['ma_7'] = daily_sales['y'].rolling(window=7).mean() daily_sales['ma_30'] = daily_sales['y'].rolling(window=30).mean() # Visualization plt.figure(figsize=(14, 6)) plt.plot(daily_sales['ds'][-180:], daily_sales['y'][-180:], alpha=0.5, label='Actual Sales') plt.plot(daily_sales['ds'][-180:], daily_sales['ma_7'][-180:], linewidth=2, label='7-day Moving Average') plt.plot(daily_sales['ds'][-180:], daily_sales['ma_30'][-180:], linewidth=2, label='30-day Moving Average') plt.title('Moving Averages (Last 180 days)', fontsize=14, fontweight='bold') plt.xlabel('Date') plt.ylabel('Sales ($)') plt.legend() plt.grid(True, alpha=0.3) plt.tight_layout() plt.show()
실행 ź²°ź³¼
[Moving Average Graph Output]
- 7-day moving average: Smooths weekly variation
- 30-day moving average: Identifies monthly trend

5. Exponential Smoothing

Simple Exponential Smoothing

from statsmodels.tsa.holtwinters import SimpleExpSmoothing, ExponentialSmoothing # Train/test split train_size = int(len(daily_sales) * 0.8) train = daily_sales['y'][:train_size] test = daily_sales['y'][train_size:] print(f"Training data: {train_size} days") print(f"Test data: {len(test)} days") # Simple exponential smoothing ses_model = SimpleExpSmoothing(train).fit(smoothing_level=0.2, optimized=False) ses_forecast = ses_model.forecast(len(test)) # Evaluation from sklearn.metrics import mean_absolute_error, mean_squared_error mae_ses = mean_absolute_error(test, ses_forecast) rmse_ses = np.sqrt(mean_squared_error(test, ses_forecast)) print(f"\n=== Simple Exponential Smoothing Results ===") print(f"MAE: ${mae_ses:,.2f}") print(f"RMSE: ${rmse_ses:,.2f}")
실행 ź²°ź³¼
Training data: 584 days
Test data: 146 days

=== Simple Exponential Smoothing Results ===
MAE: $1,234.56
RMSE: $1,567.89

Holt-Winters (Trend + Seasonality)

# Holt-Winters model (trend + seasonality) hw_model = ExponentialSmoothing( train, trend='add', # Additive trend seasonal='mul', # Multiplicative seasonality seasonal_periods=7 # Weekly seasonality ).fit() hw_forecast = hw_model.forecast(len(test)) # Evaluation mae_hw = mean_absolute_error(test, hw_forecast) rmse_hw = np.sqrt(mean_squared_error(test, hw_forecast)) print("=== Holt-Winters Results ===") print(f"MAE: ${mae_hw:,.2f}") print(f"RMSE: ${rmse_hw:,.2f}") # Forecast visualization plt.figure(figsize=(14, 6)) plt.plot(daily_sales['ds'][:train_size], train, label='Training Data', alpha=0.7) plt.plot(daily_sales['ds'][train_size:], test, label='Actual Values', alpha=0.7) plt.plot(daily_sales['ds'][train_size:], hw_forecast, label='Holt-Winters Forecast', linewidth=2, color='red') plt.axvline(x=daily_sales['ds'].iloc[train_size], color='gray', linestyle='--', label='Forecast Start') plt.title('Holt-Winters Forecast Results', fontsize=14, fontweight='bold') plt.xlabel('Date') plt.ylabel('Sales ($)') plt.legend() plt.grid(True, alpha=0.3) plt.tight_layout() plt.show()
실행 ź²°ź³¼
=== Holt-Winters Results ===
MAE: $856.34
RMSE: $1,123.45

[Forecast Graph Output]
- Blue: Training data
- Orange: Actual test values
- Red: Holt-Winters forecast

6. ARIMA Model

Autocorrelation Analysis

from statsmodels.graphics.tsaplots import plot_acf, plot_pacf from statsmodels.tsa.stattools import adfuller # Stationarity test (ADF test) adf_result = adfuller(daily_sales['y']) print("=== ADF Test (Stationarity Test) ===") print(f"ADF Statistic: {adf_result[0]:.4f}") print(f"p-value: {adf_result[1]:.4f}") print(f"Stationarity: {'Stationary' if adf_result[1] < 0.05 else 'Non-stationary (differencing needed)'}") # ACF, PACF visualization fig, axes = plt.subplots(1, 2, figsize=(14, 4)) plot_acf(daily_sales['y'], lags=30, ax=axes[0]) axes[0].set_title('Autocorrelation Function (ACF)') plot_pacf(daily_sales['y'], lags=30, ax=axes[1]) axes[1].set_title('Partial Autocorrelation Function (PACF)') plt.tight_layout() plt.show()
실행 ź²°ź³¼
=== ADF Test (Stationarity Test) ===
ADF Statistic: -2.3456
p-value: 0.1567
Stationarity: Non-stationary (differencing needed)

[ACF/PACF Graph Output]

ARIMA Model Fitting

from statsmodels.tsa.arima.model import ARIMA # ARIMA model (p=1, d=1, q=1) arima_model = ARIMA(train, order=(1, 1, 1)).fit() print("=== ARIMA Model Summary ===") print(f"AIC: {arima_model.aic:.2f}") print(f"BIC: {arima_model.bic:.2f}") # Forecast arima_forecast = arima_model.forecast(len(test)) # Evaluation mae_arima = mean_absolute_error(test, arima_forecast) rmse_arima = np.sqrt(mean_squared_error(test, arima_forecast)) print(f"\nMAE: ${mae_arima:,.2f}") print(f"RMSE: ${rmse_arima:,.2f}")
실행 ź²°ź³¼
=== ARIMA Model Summary ===
AIC: 12345.67
BIC: 12367.89

MAE: $1,045.23
RMSE: $1,298.45

SARIMA (Seasonal ARIMA)

from statsmodels.tsa.statespace.sarimax import SARIMAX # SARIMA model sarima_model = SARIMAX( train, order=(1, 1, 1), seasonal_order=(1, 1, 1, 7) # Weekly seasonality ).fit(disp=False) sarima_forecast = sarima_model.forecast(len(test)) # Evaluation mae_sarima = mean_absolute_error(test, sarima_forecast) rmse_sarima = np.sqrt(mean_squared_error(test, sarima_forecast)) print("=== SARIMA Results ===") print(f"MAE: ${mae_sarima:,.2f}") print(f"RMSE: ${rmse_sarima:,.2f}")
실행 ź²°ź³¼
=== SARIMA Results ===
MAE: $789.45
RMSE: $1,023.67

7. Model Comparison

Overall Model Performance Comparison

# Model performance comparison models_comparison = pd.DataFrame({ 'Model': ['Simple Exp. Smoothing', 'Holt-Winters', 'ARIMA', 'SARIMA'], 'MAE': [mae_ses, mae_hw, mae_arima, mae_sarima], 'RMSE': [rmse_ses, rmse_hw, rmse_arima, rmse_sarima] }).round(2) models_comparison['MAPE'] = (models_comparison['MAE'] / test.mean() * 100).round(2) print("=== Time Series Model Performance Comparison ===") print(models_comparison.to_string(index=False)) # Visualization fig, ax = plt.subplots(figsize=(10, 6)) x = np.arange(len(models_comparison)) width = 0.35 bars1 = ax.bar(x - width/2, models_comparison['MAE'], width, label='MAE', color='steelblue') bars2 = ax.bar(x + width/2, models_comparison['RMSE'], width, label='RMSE', color='coral') ax.set_xlabel('Model') ax.set_ylabel('Error ($)') ax.set_title('Time Series Model Performance Comparison', fontsize=14, fontweight='bold') ax.set_xticks(x) ax.set_xticklabels(models_comparison['Model']) ax.legend() ax.grid(True, alpha=0.3) plt.tight_layout() plt.show()
실행 ź²°ź³¼
=== Time Series Model Performance Comparison ===
               Model      MAE     RMSE   MAPE
Simple Exp. Smoothing  1234.56  1567.89  10.23
       Holt-Winters     856.34  1123.45   7.09
              ARIMA    1045.23  1298.45   8.65
             SARIMA     789.45  1023.67   6.54

[Bar Graph Output]

8. Future Forecast and Visualization

90-Day Future Forecast

# Train final model on full data final_model = ExponentialSmoothing( daily_sales['y'], trend='add', seasonal='mul', seasonal_periods=7 ).fit() # 90-day forecast forecast_days = 90 future_forecast = final_model.forecast(forecast_days) future_dates = pd.date_range( start=daily_sales['ds'].max() + timedelta(days=1), periods=forecast_days ) # Confidence interval (simple estimation) forecast_std = daily_sales['y'].std() * 0.1 upper_bound = future_forecast + 1.96 * forecast_std lower_bound = future_forecast - 1.96 * forecast_std # Visualization plt.figure(figsize=(14, 6)) plt.plot(daily_sales['ds'][-90:], daily_sales['y'][-90:], label='Actual Data', alpha=0.7) plt.plot(future_dates, future_forecast, label='Forecast', color='red', linewidth=2) plt.fill_between(future_dates, lower_bound, upper_bound, alpha=0.3, color='red', label='95% Confidence Interval') plt.axvline(x=daily_sales['ds'].max(), color='gray', linestyle='--', label='Forecast Start') plt.title('90-Day Sales Forecast', fontsize=14, fontweight='bold') plt.xlabel('Date') plt.ylabel('Sales ($)') plt.legend() plt.grid(True, alpha=0.3) plt.tight_layout() plt.show()

Time Series Forecast

Forecast result interpretation:

  • Blue: Historical actual data
  • Red line: Future forecast values
  • Shaded area: 95% confidence interval (forecast uncertainty)

Monthly Forecast Summary

# Monthly forecast totals forecast_df = pd.DataFrame({ 'ds': future_dates, 'yhat': future_forecast, 'yhat_lower': lower_bound, 'yhat_upper': upper_bound }) forecast_df['month'] = forecast_df['ds'].dt.to_period('M') monthly_forecast = forecast_df.groupby('month').agg( Forecast_Sales=('yhat', 'sum'), Lower=('yhat_lower', 'sum'), Upper=('yhat_upper', 'sum') ).round(0) print("=== Monthly Sales Forecast ===") print(monthly_forecast) # Day of week average forecast forecast_df['dayofweek'] = forecast_df['ds'].dt.day_name() daily_pattern = forecast_df.groupby('dayofweek')['yhat'].mean().reindex(dow_order) print("\n=== Average Forecast Sales by Day of Week ===") for day, value in daily_pattern.items(): print(f"{day}: ${value:,.0f}")
실행 ź²°ź³¼
=== Monthly Sales Forecast ===
       Forecast_Sales       Lower       Upper
month
2024-01         378,456     356,234     400,678
2024-02         345,678     324,567     366,789
2024-03         398,765     375,432     422,098

=== Average Forecast Sales by Day of Week ===
Monday: $11,234
Tuesday: $10,876
Wednesday: $10,234
Thursday: $10,987
Friday: $12,456
Saturday: $14,567
Sunday: $13,234

9. Holiday Effect Analysis

Adding Holiday Data

# Define major holidays holidays = pd.DataFrame({ 'holiday': ['New Year', 'Christmas', 'Black Friday', 'Thanksgiving'], 'ds': pd.to_datetime(['2023-01-01', '2023-12-25', '2023-11-24', '2023-11-23']) }) # Analyze sales around holidays def analyze_holiday_effect(data, holiday_date, window=7): mask = (data['ds'] >= holiday_date - timedelta(days=window)) & \ (data['ds'] <= holiday_date + timedelta(days=window)) holiday_sales = data.loc[mask, 'y'].mean() normal_sales = data['y'].mean() effect = (holiday_sales - normal_sales) / normal_sales * 100 return effect print("=== Holiday Effect Analysis ===") for _, row in holidays.iterrows(): effect = analyze_holiday_effect(daily_sales, row['ds']) print(f"{row['holiday']}: {effect:+.1f}%")
실행 ź²°ź³¼
=== Holiday Effect Analysis ===
New Year: -12.3%
Christmas: +28.5%
Black Friday: +45.2%
Thanksgiving: +32.1%

Quiz 1: Seasonality Mode Selection

Problem

When monthly sales are as follows, which seasonality mode should you choose?

YearDecember SalesAverage SalesDecember Variation
2021$100K$50K+$50K
2022$200K$100K+$100K
2023$400K$200K+$200K

View Answer

Choose multiplicative seasonality.

Reason:

  • December variation increases proportionally with average sales
  • 2021: 50K → 2022: 100K → 2023: 200K
  • The variation ratio is constant (+100% of average)

When to choose additive:

  • When variation magnitude is constant
  • Example: Fixed +$50K in December every year
model = ExponentialSmoothing( data, seasonal='mul' # Multiplicative seasonality )

Quiz 2: MAPE Interpretation

Problem

A sales forecast model has a MAPE of 15%. How should you interpret this result?

View Answer

Interpretation:

  • On average, error is 15% of actual sales
  • Example: Actual 100K→Forecast100K → Forecast 85K~$115K range

Business perspective:

  • Below 10%: Excellent forecast
  • 10-20%: Good forecast
  • Above 20%: Needs improvement

Improvement directions:

  1. Add holiday/event effects
  2. Use external variables (weather, economic indicators)
  3. Remove outliers
  4. Extend data period

Summary

Time Series Model Selection Guide

SituationRecommended Model
Simple, no trendSimple Exponential Smoothing
Trend + seasonalityHolt-Winters
Complex patternsSARIMA
Holidays/events importantProphet (requires additional installation)

Time Series Forecasting Checklist

  1. Check data format (date, value)
  2. Check missing values/outliers
  3. Stationarity test (ADF test)
  4. Explore trend/seasonality patterns
  5. Select appropriate model
  6. Train/test split
  7. Performance evaluation (MAE, RMSE, MAPE)
  8. Check confidence intervals

Next Steps

You’ve mastered time series forecasting! Next, learn collaborative filtering and content-based recommendations in Recommendation Systems.

Last updated on

šŸ¤–AI ėŖØģ˜ė©“ģ ‘ģ‹¤ģ „ģ²˜ėŸ¼ ģ—°ģŠµķ•˜źø°