statsmodels.tsa.statespace.structural.UnobservedComponents¶

class statsmodels.tsa.statespace.structural.UnobservedComponents(endog, level=False, trend=False, seasonal=None, freq_seasonal=None, cycle=False, autoregressive=None, exog=None, irregular=False, stochastic_level=False, stochastic_trend=False, stochastic_seasonal=True, stochastic_freq_seasonal=None, stochastic_cycle=False, damped_cycle=False, cycle_period_bounds=None, mle_regression=True, use_exact_diffuse=False, **kwargs)[source]¶

Univariate unobserved components time series model

These are also known as structural time series models, and decompose a (univariate) time series into trend, seasonal, cyclical, and irregular components.

Parameters

level{bool, str}, optional: Whether or not to include a level component. Default is False. Can also be a string specification of the level / trend component; see Notes for available model specification strings.
trendbool, optional: Whether or not to include a trend component. Default is False. If True, level must also be True.
seasonal{int, None}, optional: The period of the seasonal component, if any. Default is None.
freq_seasonal{list[dict], None}, optional.: Whether (and how) to model seasonal component(s) with trig. functions. If specified, there is one dictionary for each frequency-domain seasonal component. Each dictionary must have the key, value pair for ‘period’ – integer and may have a key, value pair for ‘harmonics’ – integer. If ‘harmonics’ is not specified in any of the dictionaries, it defaults to the floor of period/2.
cyclebool, optional: Whether or not to include a cycle component. Default is False.
autoregressive{int, None}, optional: The order of the autoregressive component. Default is None.
exog{array_like, None}, optional: Exogenous variables.
irregularbool, optional: Whether or not to include an irregular component. Default is False.
stochastic_levelbool, optional: Whether or not any level component is stochastic. Default is False.
stochastic_trendbool, optional: Whether or not any trend component is stochastic. Default is False.
stochastic_seasonalbool, optional: Whether or not any seasonal component is stochastic. Default is True.
stochastic_freq_seasonallist[bool], optional: Whether or not each seasonal component(s) is (are) stochastic. Default is True for each component. The list should be of the same length as freq_seasonal.
stochastic_cyclebool, optional: Whether or not any cycle component is stochastic. Default is False.
damped_cyclebool, optional: Whether or not the cycle component is damped. Default is False.
cycle_period_boundstuple, optional: A tuple with lower and upper allowed bounds for the period of the cycle. If not provided, the following default bounds are used: (1) if no date / time information is provided, the frequency is constrained to be between zero and \(\pi\), so the period is constrained to be in [0.5, infinity]. (2) If the date / time information is provided, the default bounds allow the cyclical component to be between 1.5 and 12 years; depending on the frequency of the endogenous variable, this will imply different specific bounds.
use_exact_diffusebool, optional: Whether or not to use exact diffuse initialization for non-stationary states. Default is False (in which case approximate diffuse initialization is used).

Notes

These models take the general form (see [1] Chapter 3.2 for all details)

\[y_t = \mu_t + \gamma_t + c_t + \varepsilon_t\]

where \(y_t\) refers to the observation vector at time \(t\), \(\mu_t\) refers to the trend component, \(\gamma_t\) refers to the seasonal component, \(c_t\) refers to the cycle, and \(\varepsilon_t\) is the irregular. The modeling details of these components are given below.

Trend

The trend component is a dynamic extension of a regression model that includes an intercept and linear time-trend. It can be written:

\[\begin{split}\mu_t = \mu_{t-1} + \beta_{t-1} + \eta_{t-1} \\ \beta_t = \beta_{t-1} + \zeta_{t-1}\end{split}\]

where the level is a generalization of the intercept term that can dynamically vary across time, and the trend is a generalization of the time-trend such that the slope can dynamically vary across time.

Here \(\eta_t \sim N(0, \sigma_\eta^2)\) and \(\zeta_t \sim N(0, \sigma_\zeta^2)\).

For both elements (level and trend), we can consider models in which:

The element is included vs excluded (if the trend is included, there must also be a level included).
The element is deterministic vs stochastic (i.e. whether or not the variance on the error term is confined to be zero or not)

The only additional parameters to be estimated via MLE are the variances of any included stochastic components.

The level/trend components can be specified using the boolean keyword arguments level, stochastic_level, trend, etc., or all at once as a string argument to level. The following table shows the available model specifications:

Model name	Full string syntax	Abbreviated syntax	Model
No trend	‘irregular’	‘ntrend’	\[y_t &= \varepsilon_t\]
Fixed intercept	‘fixed intercept’		\[y_t &= \mu\]
Deterministic constant	‘deterministic constant’	‘dconstant’	\[y_t &= \mu + \varepsilon_t\]
Local level	‘local level’	‘llevel’	\[\begin{split}y_t &= \mu_t + \varepsilon_t \\ \mu_t &= \mu_{t-1} + \eta_t\end{split}\]
Random walk	‘random walk’	‘rwalk’	\[\begin{split}y_t &= \mu_t \\ \mu_t &= \mu_{t-1} + \eta_t\end{split}\]
Fixed slope	‘fixed slope’		\[\begin{split}y_t &= \mu_t \\ \mu_t &= \mu_{t-1} + \beta\end{split}\]
Deterministic trend	‘deterministic trend’	‘dtrend’	\[\begin{split}y_t &= \mu_t + \varepsilon_t \\ \mu_t &= \mu_{t-1} + \beta\end{split}\]
Local linear deterministic trend	‘local linear deterministic trend’	‘lldtrend’	\[\begin{split}y_t &= \mu_t + \varepsilon_t \\ \mu_t &= \mu_{t-1} + \beta + \eta_t\end{split}\]
Random walk with drift	‘random walk with drift’	‘rwdrift’	\[\begin{split}y_t &= \mu_t \\ \mu_t &= \mu_{t-1} + \beta + \eta_t\end{split}\]
Local linear trend	‘local linear trend’	‘lltrend’	\[\begin{split}y_t &= \mu_t + \varepsilon_t \\ \mu_t &= \mu_{t-1} + \beta_{t-1} + \eta_t \\ \beta_t &= \beta_{t-1} + \zeta_t\end{split}\]
Smooth trend	‘smooth trend’	‘strend’	\[\begin{split}y_t &= \mu_t + \varepsilon_t \\ \mu_t &= \mu_{t-1} + \beta_{t-1} \\ \beta_t &= \beta_{t-1} + \zeta_t\end{split}\]
Random trend	‘random trend’	‘rtrend’	\[\begin{split}y_t &= \mu_t \\ \mu_t &= \mu_{t-1} + \beta_{t-1} \\ \beta_t &= \beta_{t-1} + \zeta_t\end{split}\]

Following the fitting of the model, the unobserved level and trend component time series are available in the results class in the level and trend attributes, respectively.

Seasonal (Time-domain)

The seasonal component is modeled as:

\[\begin{split}\gamma_t = - \sum_{j=1}^{s-1} \gamma_{t+1-j} + \omega_t \\ \omega_t \sim N(0, \sigma_\omega^2)\end{split}\]

The periodicity (number of seasons) is s, and the defining character is that (without the error term), the seasonal components sum to zero across one complete cycle. The inclusion of an error term allows the seasonal effects to vary over time (if this is not desired, \(\sigma_\omega^2\) can be set to zero using the stochastic_seasonal=False keyword argument).

This component results in one parameter to be selected via maximum likelihood: \(\sigma_\omega^2\), and one parameter to be chosen, the number of seasons s.

Following the fitting of the model, the unobserved seasonal component time series is available in the results class in the seasonal attribute.

** Frequency-domain Seasonal**

Each frequency-domain seasonal component is modeled as:

\[\begin{split}\gamma_t & = \sum_{j=1}^h \gamma_{j, t} \\ \gamma_{j, t+1} & = \gamma_{j, t}\cos(\lambda_j) + \gamma^{*}_{j, t}\sin(\lambda_j) + \omega_{j,t} \\ \gamma^{*}_{j, t+1} & = -\gamma^{(1)}_{j, t}\sin(\lambda_j) + \gamma^{*}_{j, t}\cos(\lambda_j) + \omega^{*}_{j, t}, \\ \omega^{*}_{j, t}, \omega_{j, t} & \sim N(0, \sigma_{\omega^2}) \\ \lambda_j & = \frac{2 \pi j}{s}\end{split}\]

where j ranges from 1 to h.

The periodicity (number of “seasons” in a “year”) is s and the number of harmonics is h. Note that h is configurable to be less than s/2, but s/2 harmonics is sufficient to fully model all seasonal variations of periodicity s. Like the time domain seasonal term (cf. Seasonal section, above), the inclusion of the error terms allows for the seasonal effects to vary over time. The argument stochastic_freq_seasonal can be used to set one or more of the seasonal components of this type to be non-random, meaning they will not vary over time.

This component results in one parameter to be fitted using maximum likelihood: \(\sigma_{\omega^2}\), and up to two parameters to be chosen, the number of seasons s and optionally the number of harmonics h, with \(1 \leq h \leq \floor(s/2)\).

After fitting the model, each unobserved seasonal component modeled in the frequency domain is available in the results class in the freq_seasonal attribute.

Cycle

The cyclical component is intended to capture cyclical effects at time frames much longer than captured by the seasonal component. For example, in economics the cyclical term is often intended to capture the business cycle, and is then expected to have a period between “1.5 and 12 years” (see Durbin and Koopman).

\[\begin{split}c_{t+1} & = \rho_c (\tilde c_t \cos \lambda_c t + \tilde c_t^* \sin \lambda_c) + \tilde \omega_t \\ c_{t+1}^* & = \rho_c (- \tilde c_t \sin \lambda_c t + \tilde c_t^* \cos \lambda_c) + \tilde \omega_t^* \\\end{split}\]

where \(\omega_t, \tilde \omega_t iid N(0, \sigma_{\tilde \omega}^2)\)

The parameter \(\lambda_c\) (the frequency of the cycle) is an additional parameter to be estimated by MLE.

If the cyclical effect is stochastic (stochastic_cycle=True), then there is another parameter to estimate (the variance of the error term - note that both of the error terms here share the same variance, but are assumed to have independent draws).

If the cycle is damped (damped_cycle=True), then there is a third parameter to estimate, \(\rho_c\).

In order to achieve cycles with the appropriate frequencies, bounds are imposed on the parameter \(\lambda_c\) in estimation. These can be controlled via the keyword argument cycle_period_bounds, which, if specified, must be a tuple of bounds on the period (lower, upper). The bounds on the frequency are then calculated from those bounds.

The default bounds, if none are provided, are selected in the following way:

If no date / time information is provided, the frequency is constrained to be between zero and \(\pi\), so the period is constrained to be in \([0.5, \infty]\).
If the date / time information is provided, the default bounds allow the cyclical component to be between 1.5 and 12 years; depending on the frequency of the endogenous variable, this will imply different specific bounds.

Following the fitting of the model, the unobserved cyclical component time series is available in the results class in the cycle attribute.

Irregular

The irregular components are independent and identically distributed (iid):

\[\varepsilon_t \sim N(0, \sigma_\varepsilon^2)\]

Autoregressive Irregular

An autoregressive component (often used as a replacement for the white noise irregular term) can be specified as:

\[\begin{split}\varepsilon_t = \rho(L) \varepsilon_{t-1} + \epsilon_t \\ \epsilon_t \sim N(0, \sigma_\epsilon^2)\end{split}\]

In this case, the AR order is specified via the autoregressive keyword, and the autoregressive coefficients are estimated.

Following the fitting of the model, the unobserved autoregressive component time series is available in the results class in the autoregressive attribute.

Regression effects

Exogenous regressors can be pass to the exog argument. The regression coefficients will be estimated by maximum likelihood unless mle_regression=False, in which case the regression coefficients will be included in the state vector where they are essentially estimated via recursive OLS.

If the regression_coefficients are included in the state vector, the recursive estimates are available in the results class in the regression_coefficients attribute.

References

1: Durbin, James, and Siem Jan Koopman. 2012. Time Series Analysis by State Space Methods: Second Edition. Oxford University Press.

Attributes

endog_names: Names of endogenous variables.
exog_names: The names of the exogenous variables.
initial_variance
initialization
loglikelihood_burn
param_names: (list of str) List of human readable parameter names (for parameters
start_params: (array) Starting parameters for maximum likelihood estimation.
state_names: (list of str) List of human readable names for unobserved states.
tolerance

Methods

`clone`(endog[, exog])	Clone state space model with new data and optionally new specification
`filter`(params[, transformed, …])	Kalman filtering
`fit`([start_params, transformed, …])	Fits the model by maximum likelihood via Kalman filter.
`fit_constrained`(constraints[, start_params])	Fit the model with some parameters subject to equality constraints.
`fix_params`(params)	Fix parameters to specific values (context manager)
`from_formula`(formula, data[, subset])	Not implemented for state space models
`hessian`(params, args, *kwargs)	Hessian matrix of the likelihood function, evaluated at the given parameters
`impulse_responses`(params[, steps, impulse, …])	Impulse response function
`information`(params)	Fisher information matrix of model.
`initialize`()	Initialize (possibly re-initialize) a Model instance.
`initialize_approximate_diffuse`([variance])	Initialize approximate diffuse
`initialize_known`(initial_state, …)	Initialize known
`initialize_statespace`(**kwargs)	Initialize the state space representation
`initialize_stationary`()	Initialize stationary
`loglike`(params, args, *kwargs)	Loglikelihood evaluation
`loglikeobs`(params[, transformed, …])	Loglikelihood evaluation
`observed_information_matrix`(params[, …])	Observed information matrix
`opg_information_matrix`(params[, …])	Outer product of gradients information matrix
`predict`(params[, exog])	After a model has been fit predict returns the fitted values.
`prepare_data`()	Prepare data for use in the state space representation
`score`(params, args, *kwargs)	Compute the score function at params.
`score_obs`(params[, method, transformed, …])	Compute the score per observation, evaluated at params
`set_conserve_memory`([conserve_memory])	Set the memory conservation method
`set_filter_method`([filter_method])	Set the filtering method
`set_inversion_method`([inversion_method])	Set the inversion method
`set_smoother_output`([smoother_output])	Set the smoother output
`set_stability_method`([stability_method])	Set the numerical stability method
`setup`()	Setup the structural time series representation
`simulate`(params, nsimulations[, …])	Simulate a new time series following the state space model
`simulation_smoother`([simulation_output])	Retrieve a simulation smoother for the state space model.
`smooth`(params[, transformed, …])	Kalman smoothing
`transform_jacobian`(unconstrained[, …])	Jacobian matrix for the parameter transformation function
`transform_params`(unconstrained)	Transform unconstrained parameters used by the optimizer to constrained parameters used in likelihood evaluation
`untransform_params`(constrained)	Reverse the transformation
`update`(params[, transformed, …])	Update the parameters of the model

handle_params
initialize_default

Methods

`clone`(endog[, exog])	Clone state space model with new data and optionally new specification
`filter`(params[, transformed, …])	Kalman filtering
`fit`([start_params, transformed, …])	Fits the model by maximum likelihood via Kalman filter.
`fit_constrained`(constraints[, start_params])	Fit the model with some parameters subject to equality constraints.
`fix_params`(params)	Fix parameters to specific values (context manager)
`from_formula`(formula, data[, subset])	Not implemented for state space models
`handle_params`(params[, transformed, …])
`hessian`(params, args, *kwargs)	Hessian matrix of the likelihood function, evaluated at the given parameters
`impulse_responses`(params[, steps, impulse, …])	Impulse response function
`information`(params)	Fisher information matrix of model.
`initialize`()	Initialize (possibly re-initialize) a Model instance.
`initialize_approximate_diffuse`([variance])	Initialize approximate diffuse
`initialize_default`([…])
`initialize_known`(initial_state, …)	Initialize known
`initialize_statespace`(**kwargs)	Initialize the state space representation
`initialize_stationary`()	Initialize stationary
`loglike`(params, args, *kwargs)	Loglikelihood evaluation
`loglikeobs`(params[, transformed, …])	Loglikelihood evaluation
`observed_information_matrix`(params[, …])	Observed information matrix
`opg_information_matrix`(params[, …])	Outer product of gradients information matrix
`predict`(params[, exog])	After a model has been fit predict returns the fitted values.
`prepare_data`()	Prepare data for use in the state space representation
`score`(params, args, *kwargs)	Compute the score function at params.
`score_obs`(params[, method, transformed, …])	Compute the score per observation, evaluated at params
`set_conserve_memory`([conserve_memory])	Set the memory conservation method
`set_filter_method`([filter_method])	Set the filtering method
`set_inversion_method`([inversion_method])	Set the inversion method
`set_smoother_output`([smoother_output])	Set the smoother output
`set_stability_method`([stability_method])	Set the numerical stability method
`setup`()	Setup the structural time series representation
`simulate`(params, nsimulations[, …])	Simulate a new time series following the state space model
`simulation_smoother`([simulation_output])	Retrieve a simulation smoother for the state space model.
`smooth`(params[, transformed, …])	Kalman smoothing
`transform_jacobian`(unconstrained[, …])	Jacobian matrix for the parameter transformation function
`transform_params`(unconstrained)	Transform unconstrained parameters used by the optimizer to constrained parameters used in likelihood evaluation
`untransform_params`(constrained)	Reverse the transformation
`update`(params[, transformed, …])	Update the parameters of the model

Properties

`endog_names`	Names of endogenous variables.
`exog_names`	The names of the exogenous variables.
`initial_variance`
`initialization`
`loglikelihood_burn`
`param_names`	(list of str) List of human readable parameter names (for parameters actually included in the model).
`start_params`	(array) Starting parameters for maximum likelihood estimation.
`state_names`	(list of str) List of human readable names for unobserved states.
`tolerance`