{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Seasonal-Trend decomposition using LOESS (STL)\n", "\n", "This note book illustrates the use of STL to decompose a time series into three components: trend, season(al) and residual. STL uses LOESS (locally estimated scatterplot smoothing) to extract smooths estimates of the three components. The key inputs into STL are:\n", "\n", "* season - The length of the seasonal smoother. Must be odd.\n", "* trend - The length of the trend smoother, usually around 150% of season. Must be odd and larger than season.\n", "* low_pass - The length of the low-pass estimation window, usually the smallest odd number larger than the periodicity of the data.\n", "\n", "First we import the required packages, prepare the graphics environment, and prepare the data. " ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "execution": { "iopub.execute_input": "2022-06-29T20:41:34.637029Z", "iopub.status.busy": "2022-06-29T20:41:34.636450Z", "iopub.status.idle": "2022-06-29T20:41:36.174859Z", "shell.execute_reply": "2022-06-29T20:41:36.172271Z" } }, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "import pandas as pd\n", "import seaborn as sns\n", "from pandas.plotting import register_matplotlib_converters\n", "\n", "register_matplotlib_converters()\n", "sns.set_style(\"darkgrid\")" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "execution": { "iopub.execute_input": "2022-06-29T20:41:36.180701Z", "iopub.status.busy": "2022-06-29T20:41:36.179306Z", "iopub.status.idle": "2022-06-29T20:41:36.184972Z", "shell.execute_reply": "2022-06-29T20:41:36.184219Z" } }, "outputs": [], "source": [ "plt.rc(\"figure\", figsize=(16, 12))\n", "plt.rc(\"font\", size=13)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Atmospheric CO2\n", "\n", "The example in Cleveland, Cleveland, McRae, and Terpenning (1990) uses CO2 data, which is in the list below. This monthly data (January 1959 to December 1987) has a clear trend and seasonality across the sample. co2 = [
 315.58,
 316.39,
 ...
 348.67,
]
co2 = pd.Series(
 co2, index=pd.date_range("1-1-1959", periods=len(co2), freq="M"), name="CO2"
)
co2.describe() If the data series does not have a frequency, then you must also specify period. The default value for seasonal is 7, and so should also be changed in most applications." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "execution": { "iopub.execute_input": "2022-06-29T20:41:36.233665Z", "iopub.status.busy": "2022-06-29T20:41:36.232950Z", "iopub.status.idle": "2022-06-29T20:41:37.715254Z", "shell.execute_reply": "2022-06-29T20:41:37.714550Z" } }, "outputs": [ { "data": { "image/png": "text/plain": [ "