statsmodels.tsa.statespace.dynamic_factor_mq.DynamicFactorMQ¶
- class statsmodels.tsa.statespace.dynamic_factor_mq.DynamicFactorMQ(endog, k_endog_monthly=None, factors=1, factor_orders=1, factor_multiplicities=None, idiosyncratic_ar1=True, standardize=True, endog_quarterly=None, init_t0=False, obs_cov_diag=False, **kwargs)[source]¶
Dynamic factor model with EM algorithm; option for monthly/quarterly data.
Implementation of the dynamic factor model of Bańbura and Modugno (2014) ([1]) and Bańbura, Giannone, and Reichlin (2011) ([2]). Uses the EM algorithm for parameter fitting, and so can accommodate a large number of left-hand-side variables. Specifications can include any collection of blocks of factors, including different factor autoregression orders, and can include AR(1) processes for idiosyncratic disturbances. Can incorporate monthly/quarterly mixed frequency data along the lines of Mariano and Murasawa (2011) ([4]). A special case of this model is the Nowcasting model of Bok et al. (2017) ([3]). Moreover, this model can be used to compute the news associated with updated data releases.
- Parameters:
- endogarray_like
Observed time-series process \(y\). See the “Notes” section for details on how to set up a model with monthly/quarterly mixed frequency data.
- k_endog_monthly
int
,optional
If specifying a monthly/quarterly mixed frequency model in which the provided endog dataset contains both the monthly and quarterly data, this variable should be used to indicate how many of the variables are monthly. Note that when using the k_endog_monthly argument, the columns with monthly variables in endog should be ordered first, and the columns with quarterly variables should come afterwards. See the “Notes” section for details on how to set up a model with monthly/quarterly mixed frequency data.
- factors
int
,list
,or
dict
,optional
Integer giving the number of (global) factors, a list with the names of (global) factors, or a dictionary with:
keys : names of endogenous variables
values : lists of factor names.
If this is an integer, then the factor names will be 0, 1, …. The default is a single factor that loads on all variables. Note that there cannot be more factors specified than there are monthly variables.
- factor_orders
int
ordict
,optional
Integer describing the order of the vector autoregression (VAR) governing all factor block dynamics or dictionary with:
keys : factor name or tuples of factor names in a block
values : integer describing the VAR order for that factor block
If a dictionary, this defines the order of the factor blocks in the state vector. Otherwise, factors are ordered so that factors that load on more variables come first (and then alphabetically, to break ties).
- factor_multiplicities
int
ordict
,optional
This argument provides a convenient way to specify multiple factors that load identically on variables. For example, one may want two “global” factors (factors that load on all variables) that evolve jointly according to a VAR. One could specify two global factors in the factors argument and specify that they are in the same block in the factor_orders argument, but it is easier to specify a single global factor in the factors argument, and set the order in the factor_orders argument, and then set the factor multiplicity to 2.
This argument must be an integer describing the factor multiplicity for all factors or dictionary with:
keys : factor name
values : integer describing the factor multiplicity for the factors in the given block
- idiosyncratic_ar1bool
Whether or not to model the idiosyncratic component for each series as an AR(1) process. If False, the idiosyncratic component is instead modeled as white noise.
- standardizebool or
tuple
,optional
If a boolean, whether or not to standardize each endogenous variable to have mean zero and standard deviation 1 before fitting the model. See “Notes” for details about how this option works with postestimation output. If a tuple (usually only used internally), then the tuple must have length 2, with each element containing a Pandas series with index equal to the names of the endogenous variables. The first element should contain the mean values and the second element should contain the standard deviations. Default is True.
- endog_quarterly
pandas.Series
orpandas.DataFrame
Observed quarterly variables. If provided, must be a Pandas Series or DataFrame with a DatetimeIndex or PeriodIndex at the quarterly frequency. See the “Notes” section for details on how to set up a model with monthly/quarterly mixed frequency data.
- init_t0bool,
optional
If True, this option initializes the Kalman filter with the distribution for \(\alpha_0\) rather than \(\alpha_1\). See the “Notes” section for more details. This option is rarely used except for testing. Default is False.
- obs_cov_diagbool,
optional
If True and if idiosyncratic_ar1 is True, then this option puts small positive values in the observation disturbance covariance matrix. This is not required for estimation and is rarely used except for testing. (It is sometimes used to prevent numerical errors, for example those associated with a positive semi-definite forecast error covariance matrix at the first time step when using EM initialization, but state space models in Statsmodels switch to the univariate approach in those cases, and so do not need to use this trick). Default is False.
Notes
The basic model is:
\[\begin{split}y_t & = \Lambda f_t + \epsilon_t \\ f_t & = A_1 f_{t-1} + \dots + A_p f_{t-p} + u_t\end{split}\]where:
\(y_t\) is observed data at time t
\(\epsilon_t\) is idiosyncratic disturbance at time t (see below for details, including modeling serial correlation in this term)
\(f_t\) is the unobserved factor at time t
\(u_t \sim N(0, Q)\) is the factor disturbance at time t
and:
\(\Lambda\) is referred to as the matrix of factor loadings
\(A_i\) are matrices of autoregression coefficients
Furthermore, we allow the idiosyncratic disturbances to be serially correlated, so that, if idiosyncratic_ar1=True, \(\epsilon_{i,t} = \rho_i \epsilon_{i,t-1} + e_{i,t}\), where \(e_{i,t} \sim N(0, \sigma_i^2)\). If idiosyncratic_ar1=False, then we instead have \(\epsilon_{i,t} = e_{i,t}\).
This basic setup can be found in [1], [2], [3], and [4].
We allow for two generalizations of this model:
Following [2], we allow multiple “blocks” of factors, which are independent from the other blocks of factors. Different blocks can be set to load on different subsets of the observed variables, and can be specified with different lag orders.
Following [4] and [2], we allow mixed frequency models in which both monthly and quarterly data are used. See the section on “Mixed frequency models”, below, for more details.
Additional notes:
The observed data may contain arbitrary patterns of missing entries.
EM algorithm
This model contains a potentially very large number of parameters, and it can be difficult and take a prohibitively long time to numerically optimize the likelihood function using quasi-Newton methods. Instead, the default fitting method in this model uses the EM algorithm, as detailed in [1]. As a result, the model can accommodate datasets with hundreds of observed variables.
Mixed frequency data
This model can handle mixed frequency data in two ways. In this section, we only briefly describe this, and refer readers to [2] and [4] for all details.
First, because there can be arbitrary patterns of missing data in the observed vector, one can simply include lower frequency variables as observed in a particular higher frequency period, and missing otherwise. For example, in a monthly model, one could include quarterly data as occurring on the third month of each quarter. To use this method, one simply needs to combine the data into a single dataset at the higher frequency that can be passed to this model as the endog argument. However, depending on the type of variables used in the analysis and the assumptions about the data generating process, this approach may not be valid.
For example, suppose that we are interested in the growth rate of real GDP, which is measured at a quarterly frequency. If the basic factor model is specified at a monthly frequency, then the quarterly growth rate in the third month of each quarter – which is what we actually observe – is approximated by a particular weighted average of unobserved monthly growth rates. We need to take this particular weight moving average into account in constructing our model, and this is what the second approach does.
The second approach follows [2] and [4] in constructing a state space form to explicitly model the quarterly growth rates in terms of the unobserved monthly growth rates. To use this approach, there are two methods:
Combine the monthly and quarterly data into a single dataset at the monthly frequency, with the monthly data in the first columns and the quarterly data in the last columns. Pass this dataset to the model as the endog argument and give the number of the variables that are monthly as the k_endog_monthly argument.
Construct a monthly dataset as a Pandas DataFrame with a DatetimeIndex or PeriodIndex at the monthly frequency and separately construct a quarterly dataset as a Pandas DataFrame with a DatetimeIndex or PeriodIndex at the quarterly frequency. Pass the monthly DataFrame to the model as the endog argument and pass the quarterly DataFrame to the model as the endog_quarterly argument.
Note that this only incorporates one particular type of mixed frequency data. See also Banbura et al. (2013). “Now-Casting and the Real-Time Data Flow.” for discussion about other types of mixed frequency data that are not supported by this framework.
Nowcasting and the news
Through its support for monthly/quarterly mixed frequency data, this model can allow for the nowcasting of quarterly variables based on monthly observations. In particular, [2] and [3] use this model to construct nowcasts of real GDP and analyze the impacts of “the news”, derived from incoming data on a real-time basis. This latter functionality can be accessed through the news method of the results object.
Standardizing data
As is often the case in formulating a dynamic factor model, we do not explicitly account for the mean of each observed variable. Instead, the default behavior is to standardize each variable prior to estimation. Thus if \(y_t\) are the given observed data, the dynamic factor model is actually estimated on the standardized data defined by:
\[x_{i, t} = (y_{i, t} - \bar y_i) / s_i\]where \(\bar y_i\) is the sample mean and \(s_i\) is the sample standard deviation.
By default, if standardization is applied prior to estimation, results such as in-sample predictions, out-of-sample forecasts, and the computation of the “news” are reported in the scale of the original data (i.e. the model output has the reverse transformation applied before it is returned to the user).
Standardization can be disabled by passing standardization=False to the model constructor.
Identification of factors and loadings
The estimated factors and the factor loadings in this model are only identified up to an invertible transformation. As described in (the working paper version of) [2], while it is possible to impose normalizations to achieve identification, the EM algorithm does will converge regardless. Moreover, for nowcasting and forecasting purposes, identification is not required. This model does not impose any normalization to identify the factors and the factor loadings.
Miscellaneous
There are two arguments available in the model constructor that are rarely used but which deserve a brief mention: init_t0 and obs_cov_diag. These arguments are provided to allow exactly matching the output of other packages that have slight differences in how the underlying state space model is set up / applied.
init_t0: state space models in Statsmodels follow Durbin and Koopman in initializing the model with \(\alpha_1 \sim N(a_1, P_1)\). Other implementations sometimes initialize instead with \(\alpha_0 \sim N(a_0, P_0)\). We can accommodate this by prepending a row of NaNs to the observed dataset.
obs_cov_diag: the state space form in [1] incorporates non-zero (but very small) diagonal elements for the observation disturbance covariance matrix.
References
[1] (1,2,3,4)Bańbura, Marta, and Michele Modugno. “Maximum likelihood estimation of factor models on datasets with arbitrary pattern of missing data.” Journal of Applied Econometrics 29, no. 1 (2014): 133-160.
[2] (1,2,3,4,5,6,7,8)Bańbura, Marta, Domenico Giannone, and Lucrezia Reichlin. “Nowcasting.” The Oxford Handbook of Economic Forecasting. July 8, 2011.
[3] (1,2,3)Bok, Brandyn, Daniele Caratelli, Domenico Giannone, Argia M. Sbordone, and Andrea Tambalotti. 2018. “Macroeconomic Nowcasting and Forecasting with Big Data.” Annual Review of Economics 10 (1): 615-43. https://doi.org/10.1146/annurev-economics-080217-053214.
Examples
Constructing and fitting a DynamicFactorMQ model.
>>> data = sm.datasets.macrodata.load_pandas().data.iloc[-100:] >>> data.index = pd.period_range(start='1984Q4', end='2009Q3', freq='Q') >>> endog = data[['infl', 'tbilrate']].resample('M').last() >>> endog_Q = np.log(data[['realgdp', 'realcons']]).diff().iloc[1:] * 400
Basic usage
In the simplest case, passing only the endog argument results in a model with a single factor that follows an AR(1) process. Note that because we are not also providing an endog_quarterly dataset, endog can be a numpy array or Pandas DataFrame with any index (it does not have to be monthly).
The summary method can be useful in checking the model specification.
>>> mod = sm.tsa.DynamicFactorMQ(endog) >>> print(mod.summary()) Model Specification: Dynamic Factor Model ========================================================================== Model: Dynamic Factor Model # of monthly variables: 2 + 1 factors in 1 blocks # of factors: 1 + AR(1) idiosyncratic Idiosyncratic disturbances: AR(1) Sample: 1984-10 Standardize variables: True - 2009-09 Observed variables / factor loadings ======================== Dep. variable 0 ------------------------ infl X tbilrate X Factor blocks: ===================== block order --------------------- 0 1 =====================
Factors
With factors=2, there will be two independent factors that will each evolve according to separate AR(1) processes.
>>> mod = sm.tsa.DynamicFactorMQ(endog, factors=2) >>> print(mod.summary()) Model Specification: Dynamic Factor Model ========================================================================== Model: Dynamic Factor Model # of monthly variables: 2 + 2 factors in 2 blocks # of factors: 2 + AR(1) idiosyncratic Idiosyncratic disturbances: AR(1) Sample: 1984-10 Standardize variables: True - 2009-09 Observed variables / factor loadings =================================== Dep. variable 0 1 ----------------------------------- infl X X tbilrate X X Factor blocks: ===================== block order --------------------- 0 1 1 1 =====================
Factor multiplicities
By instead specifying factor_multiplicities=2, we would still have two factors, but they would be dependent and would evolve jointly according to a VAR(1) process.
>>> mod = sm.tsa.DynamicFactorMQ(endog, factor_multiplicities=2) >>> print(mod.summary()) Model Specification: Dynamic Factor Model ========================================================================== Model: Dynamic Factor Model # of monthly variables: 2 + 2 factors in 1 blocks # of factors: 2 + AR(1) idiosyncratic Idiosyncratic disturbances: AR(1) Sample: 1984-10 Standardize variables: True - 2009-09 Observed variables / factor loadings =================================== Dep. variable 0.1 0.2 ----------------------------------- infl X X tbilrate X X Factor blocks: ===================== block order --------------------- 0.1, 0.2 1 =====================
Factor orders
In either of the above cases, we could extend the order of the (vector) autoregressions by using the factor_orders argument. For example, the below model would contain two independent factors that each evolve according to a separate AR(2) process:
>>> mod = sm.tsa.DynamicFactorMQ(endog, factors=2, factor_orders=2) >>> print(mod.summary()) Model Specification: Dynamic Factor Model ========================================================================== Model: Dynamic Factor Model # of monthly variables: 2 + 2 factors in 2 blocks # of factors: 2 + AR(1) idiosyncratic Idiosyncratic disturbances: AR(1) Sample: 1984-10 Standardize variables: True - 2009-09 Observed variables / factor loadings =================================== Dep. variable 0 1 ----------------------------------- infl X X tbilrate X X Factor blocks: ===================== block order --------------------- 0 2 1 2 =====================
Serial correlation in the idiosyncratic disturbances
By default, the model allows each idiosyncratic disturbance terms to evolve according to an AR(1) process. If preferred, they can instead be specified to be serially independent by passing ididosyncratic_ar1=False.
>>> mod = sm.tsa.DynamicFactorMQ(endog, idiosyncratic_ar1=False) >>> print(mod.summary()) Model Specification: Dynamic Factor Model ========================================================================== Model: Dynamic Factor Model # of monthly variables: 2 + 1 factors in 1 blocks # of factors: 1 + iid idiosyncratic Idiosyncratic disturbances: iid Sample: 1984-10 Standardize variables: True - 2009-09 Observed variables / factor loadings ======================== Dep. variable 0 ------------------------ infl X tbilrate X Factor blocks: ===================== block order --------------------- 0 1 =====================
Monthly / Quarterly mixed frequency
To specify a monthly / quarterly mixed frequency model see the (Notes section for more details about these models):
>>> mod = sm.tsa.DynamicFactorMQ(endog, endog_quarterly=endog_Q) >>> print(mod.summary()) Model Specification: Dynamic Factor Model ========================================================================== Model: Dynamic Factor Model # of monthly variables: 2 + 1 factors in 1 blocks # of quarterly variables: 2 + Mixed frequency (M/Q) # of factors: 1 + AR(1) idiosyncratic Idiosyncratic disturbances: AR(1) Sample: 1984-10 Standardize variables: True - 2009-09 Observed variables / factor loadings ======================== Dep. variable 0 ------------------------ infl X tbilrate X realgdp X realcons X Factor blocks: ===================== block order --------------------- 0 1 =====================
Customize observed variable / factor loadings
To specify that certain that certain observed variables only load on certain factors, it is possible to pass a dictionary to the factors argument.
>>> factors = {'infl': ['global'] ... 'tbilrate': ['global'] ... 'realgdp': ['global', 'real'] ... 'realcons': ['global', 'real']} >>> mod = sm.tsa.DynamicFactorMQ(endog, endog_quarterly=endog_Q) >>> print(mod.summary()) Model Specification: Dynamic Factor Model ========================================================================== Model: Dynamic Factor Model # of monthly variables: 2 + 2 factors in 2 blocks # of quarterly variables: 2 + Mixed frequency (M/Q) # of factor blocks: 2 + AR(1) idiosyncratic Idiosyncratic disturbances: AR(1) Sample: 1984-10 Standardize variables: True - 2009-09 Observed variables / factor loadings =================================== Dep. variable global real ----------------------------------- infl X tbilrate X realgdp X X realcons X X Factor blocks: ===================== block order --------------------- global 1 real 1 =====================
Fitting parameters
To fit the model, use the fit method. This method uses the EM algorithm by default.
>>> mod = sm.tsa.DynamicFactorMQ(endog) >>> res = mod.fit() >>> print(res.summary()) Dynamic Factor Results ========================================================================== Dep. Variable: ['infl', 'tbilrate'] No. Observations: 300 Model: Dynamic Factor Model Log Likelihood -127.909 + 1 factors in 1 blocks AIC 271.817 + AR(1) idiosyncratic BIC 301.447 Date: Tue, 04 Aug 2020 HQIC 283.675 Time: 15:59:11 EM Iterations 83 Sample: 10-31-1984 - 09-30-2009 Covariance Type: Not computed Observation equation: ============================================================== Factor loadings: 0 idiosyncratic: AR(1) var. -------------------------------------------------------------- infl -0.67 0.39 0.73 tbilrate -0.63 0.99 0.01 Transition: Factor block 0 ======================================= L1.0 error variance --------------------------------------- 0 0.98 0.01 ======================================= Warnings: [1] Covariance matrix not calculated.
Displaying iteration progress
To display information about the EM iterations, use the disp argument.
>>> mod = sm.tsa.DynamicFactorMQ(endog) >>> res = mod.fit(disp=10) EM start iterations, llf=-291.21 EM iteration 10, llf=-157.17, convergence criterion=0.053801 EM iteration 20, llf=-128.99, convergence criterion=0.0035545 EM iteration 30, llf=-127.97, convergence criterion=0.00010224 EM iteration 40, llf=-127.93, convergence criterion=1.3281e-05 EM iteration 50, llf=-127.92, convergence criterion=5.4725e-06 EM iteration 60, llf=-127.91, convergence criterion=2.8665e-06 EM iteration 70, llf=-127.91, convergence criterion=1.6999e-06 EM iteration 80, llf=-127.91, convergence criterion=1.1085e-06 EM converged at iteration 83, llf=-127.91, convergence criterion=9.9004e-07 < tolerance=1e-06
Results: forecasting, impulse responses, and more
One the model is fitted, there are a number of methods available from the results object. Some examples include:
Forecasting
>>> mod = sm.tsa.DynamicFactorMQ(endog) >>> res = mod.fit() >>> print(res.forecast(steps=5)) infl tbilrate 2009-10 1.784169 0.260401 2009-11 1.735848 0.305981 2009-12 1.730674 0.350968 2010-01 1.742110 0.395369 2010-02 1.759786 0.439194
Impulse responses
>>> mod = sm.tsa.DynamicFactorMQ(endog) >>> res = mod.fit() >>> print(res.impulse_responses(steps=5)) infl tbilrate 0 -1.511956 -1.341498 1 -1.483172 -1.315960 2 -1.454937 -1.290908 3 -1.427240 -1.266333 4 -1.400069 -1.242226 5 -1.373416 -1.218578
For other available methods (including in-sample prediction, simulation of time series, extending the results to incorporate new data, and the news), see the documentation for state space models.
- Attributes:
endog_names
Names of endogenous variables.
exog_names
The names of the exogenous variables.
- initial_variance
- initialization
loglike_constant
Constant term in the joint log-likelihood function.
- loglikelihood_burn
param_names
(list of str) List of human readable parameter names.
start_params
(array) Starting parameters for maximum likelihood estimation.
state_names
(list of str) List of human readable names for unobserved states.
- tolerance
Methods
clone
(endog[, k_endog_monthly, ...])Clone state space model with new data and optionally new specification.
construct_endog
(endog_monthly, endog_quarterly)Construct a combined dataset from separate monthly and quarterly data.
filter
(params[, transformed, ...])Kalman filtering.
fit
([start_params, transformed, ...])Fits the model by maximum likelihood via Kalman filter.
fit_constrained
(constraints[, start_params])Fit the model with some parameters subject to equality constraints.
fit_em
([start_params, transformed, ...])Fits the model by maximum likelihood via the EM algorithm.
fix_params
(params)Fix parameters to specific values (context manager)
from_formula
(formula, data[, subset])Not implemented for state space models
handle_params
(params[, transformed, ...])Ensure model parameters satisfy shape and other requirements
hessian
(params, *args, **kwargs)Hessian matrix of the likelihood function, evaluated at the given parameters
impulse_responses
(params[, steps, impulse, ...])Impulse response function.
information
(params)Fisher information matrix of model.
Initialize (possibly re-initialize) a Model instance.
initialize_approximate_diffuse
([variance])Initialize approximate diffuse
initialize_known
(initial_state, ...)Initialize known
initialize_statespace
(**kwargs)Initialize the state space representation
Initialize stationary
Matrix formulation of quarterly variables' factor loading constraints.
loglike
(params, *args, **kwargs)Loglikelihood evaluation
loglikeobs
(params[, transformed, ...])Loglikelihood evaluation
observed_information_matrix
(params[, ...])Observed information matrix
opg_information_matrix
(params[, ...])Outer product of gradients information matrix
predict
(params[, exog])After a model has been fit predict returns the fitted values.
Prepare data for use in the state space representation
score
(params, *args, **kwargs)Compute the score function at params.
score_obs
(params[, method, transformed, ...])Compute the score per observation, evaluated at params
set_conserve_memory
([conserve_memory])Set the memory conservation method
set_filter_method
([filter_method])Set the filtering method
set_inversion_method
([inversion_method])Set the inversion method
set_smoother_output
([smoother_output])Set the smoother output
set_stability_method
([stability_method])Set the numerical stability method
simulate
(params, nsimulations[, ...])Simulate a new time series following the state space model.
simulation_smoother
([simulation_output])Retrieve a simulation smoother for the state space model.
smooth
(params[, transformed, ...])Kalman smoothing.
summary
([truncate_endog_names])Create a summary table describing the model.
transform_jacobian
(unconstrained[, ...])Jacobian matrix for the parameter transformation function
transform_params
(unconstrained)Transform parameters from optimizer space to model space.
untransform_params
(constrained)Transform parameters from model space to optimizer space.
update
(params, **kwargs)Update the parameters of the model.
Properties
Names of endogenous variables.
The names of the exogenous variables.
Constant term in the joint log-likelihood function.
(list of str) List of human readable parameter names.
(array) Starting parameters for maximum likelihood estimation.
(list of str) List of human readable names for unobserved states.