{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Seasonality in time series data\n", "\n", "Consider the problem of modeling time series data with multiple seasonal components with different periodicities. Let us take the time series $y_t$ and decompose it explicitly to have a level component and two seasonal components.\n", "\n", "$$\n", "y_t = \\mu_t + \\gamma^{(1)}_t + \\gamma^{(2)}_t\n", "$$\n", "\n", "where $\\mu_t$ represents the trend or level, $\\gamma^{(1)}_t$ represents a seasonal component with a relatively short period, and $\\gamma^{(2)}_t$ represents another seasonal component of longer period. We will have a fixed intercept term for our level and consider both $\\gamma^{(2)}_t$ and $\\gamma^{(2)}_t$ to be stochastic so that the seasonal patterns can vary over time.\n", "\n", "In this notebook, we will generate synthetic data conforming to this model and showcase modeling of the seasonal terms in a few different ways under the unobserved components modeling framework." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "%matplotlib inline" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "import statsmodels.api as sm\n", "import matplotlib.pyplot as plt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Synthetic data creation\n", "\n", "We will create data with multiple seasonal patterns by following equations (3.7) and (3.8) in Durbin and Koopman (2012). We will simulate 300 periods and two seasonal terms parametrized in the frequency domain having periods 10 and 100, respectively, and 3 and 2 number of harmonics, respectively. Further, the variances of their stochastic parts are 4 and 9, respectively." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "# First we'll simulate the synthetic data\n", "def simulate_seasonal_term(periodicity, total_cycles, noise_std=1.,\n", " harmonics=None):\n", " duration = periodicity * total_cycles\n", " assert duration == int(duration)\n", " duration = int(duration)\n", " harmonics = harmonics if harmonics else int(np.floor(periodicity / 2))\n", "\n", " lambda_p = 2 * np.pi / float(periodicity)\n", "\n", " gamma_jt = noise_std * np.random.randn((harmonics))\n", " gamma_star_jt = noise_std * np.random.randn((harmonics))\n", "\n", " total_timesteps = 100 * duration # Pad for burn in\n", " series = np.zeros(total_timesteps) \n", " for t in range(total_timesteps):\n", " gamma_jtp1 = np.zeros_like(gamma_jt)\n", " gamma_star_jtp1 = np.zeros_like(gamma_star_jt)\n", " for j in range(1, harmonics + 1):\n", " cos_j = np.cos(lambda_p * j)\n", import numpy as np
import pandas as pd
import statsmodels.api as sm
import matplotlib.pyplot as plt

# First we'll simulate the synthetic data
def simulate_seasonal_term(periodicity, total_cycles, noise_std=1.,
 harmonics=None):
 duration = periodicity * total_cycles
 assert duration == int(duration)
 duration = int(duration)
 harmonics = harmonics if harmonics else int(np.floor(periodicity / 2))

 lambda_p = 2 * np.pi / float(periodicity)

 gamma_jt = noise_std * np.random.randn((harmonics))
 gamma_star_jt = noise_std * np.random.randn((harmonics))

 total_timesteps = 100 * duration # Pad for burn in
 series = np.zeros(total_timesteps) 
 for t in range(total_timesteps):
 gamma_jtp1 = np.zeros_like(gamma_jt)
 gamma_star_jtp1 = np.zeros_like(gamma_star_jt)
 for j in range(1, harmonics + 1):
 cos_j = np.cos(lambda_p * j) "text/plain": [ "