statsmodels.tsa.ar_model.ar_select_order(endog, maxlag, ic='bic', glob=False, trend='c', seasonal=False, exog=None, hold_back=None, period=None, missing='none', old_names=False)[source]

Autoregressive AR-X(p) model order selection.


A 1-d endogenous response variable. The independent variable.


The maximum lag to consider.

ic{‘aic’, ‘hqic’, ‘bic’}

The information criterion to use in the selection.


Flag indicating where to use a global search across all combinations of lags. In practice, this option is not computational feasible when maxlag is larger than 15 (or perhaps 20) since the global search requires fitting 2**maxlag models.

trend{‘n’, ‘c’, ‘t’, ‘ct’}

The trend to include in the model:

  • ‘n’ - No trend.

  • ‘c’ - Constant only.

  • ‘t’ - Time trend only.

  • ‘ct’ - Constant and time trend.


Flag indicating whether to include seasonal dummies in the model. If seasonal is True and trend includes ‘c’, then the first period is excluded from the seasonal terms.

exogarray_like, optional

Exogenous variables to include in the model. Must have the same number of observations as endog and should be aligned so that endog[i] is regressed on exog[i].

hold_back{None, int}

Initial observations to exclude from the estimation sample. If None, then hold_back is equal to the maximum lag in the model. Set to a non-zero value to produce comparable models with different lag length. For example, to compare the fit of a model with lags=3 and lags=1, set hold_back=3 which ensures that both models are estimated using observations 3,…,nobs. hold_back must be >= the maximum lag in the model.

period{None, int}

The period of the data. Only used if seasonal is True. This parameter can be omitted if using a pandas object for endog that contains a recognized frequency.


Available options are ‘none’, ‘drop’, and ‘raise’. If ‘none’, no nan checking is done. If ‘drop’, any observations with nans are dropped. If ‘raise’, an error is raised. Default is ‘none’.


Flag indicating whether to use the v0.11 names or the v0.12+ names.

Deprecated since version 0.13.0: old_names is deprecated and will be removed after 0.14 is released. You must update any code reliant on the old variable names to use the new names.


A results holder containing the model and the complete set of information criteria for all models fit.


>>> from statsmodels.tsa.ar_model import ar_select_order
>>> data = sm.datasets.sunspots.load_pandas().data['SUNACTIVITY']

Determine the optimal lag structure

>>> mod = ar_select_order(data, maxlag=13)
>>> mod.ar_lags
array([1, 2, 3, 4, 5, 6, 7, 8, 9])

Determine the optimal lag structure with seasonal terms

>>> mod = ar_select_order(data, maxlag=13, seasonal=True, period=12)
>>> mod.ar_lags
array([1, 2, 3, 4, 5, 6, 7, 8, 9])

Globally determine the optimal lag structure

>>> mod = ar_select_order(data, maxlag=13, glob=True)
>>> mod.ar_lags
array([1, 2, 9])

Last update: Jul 16, 2024