API Reference

The main statsmodels API is split into models:

  • statsmodels.api: Cross-sectional models and methods. Canonically imported using import statsmodels.api as sm.

  • statsmodels.tsa.api: Time-series models and methods. Canonically imported using import statsmodels.tsa.api as tsa.

  • statsmodels.formula.api: A convenience interface for specifying models using formula strings and DataFrames. This API directly exposes the from_formula class method of models that support the formula API. Canonically imported using import statsmodels.formula.api as smf

The API focuses on models and the most frequently used statistical test, and tools. Import Paths and Structure explains the design of the two API modules and how importing from the API differs from directly importing from the module where the model is defined. See the detailed topic pages in the User Guide for a complete list of available models, statistics, and tools.

statsmodels.api

Regression

OLS(endog[, exog, missing, hasconst])

Ordinary Least Squares

GLS(endog, exog[, sigma, missing, hasconst])

Generalized Least Squares

GLSAR(endog[, exog, rho, missing, hasconst])

Generalized Least Squares with AR covariance structure

WLS(endog, exog[, weights, missing, hasconst])

Weighted Least Squares

RecursiveLS(endog, exog[, constraints])

Recursive least squares

RollingOLS(endog, exog[, window, min_nobs, …])

Rolling Ordinary Least Squares

RollingWLS(endog, exog[, window, weights, …])

Rolling Weighted Least Squares

Imputation

BayesGaussMI(data[, mean_prior, cov_prior, …])

Bayesian Imputation using a Gaussian model.

BinomialBayesMixedGLM(endog, exog, exog_vc, …)

Generalized Linear Mixed Model with Bayesian estimation

Factor([endog, n_factor, corr, method, smc, …])

Factor analysis

MI(imp, model[, model_args_fn, …])

MI performs multiple imputation using a provided imputer object.

MICE(model_formula, model_class, data[, …])

Multiple Imputation with Chained Equations.

MICEData(data[, perturbation_method, k_pmm, …])

Wrap a data set to allow missing data handling with MICE.

Generalized Estimating Equations

GEE(endog, exog, groups[, time, family, …])

Marginal Regression Model using Generalized Estimating Equations.

NominalGEE(endog, exog, groups[, time, …])

Nominal Response Marginal Regression Model using GEE.

OrdinalGEE(endog, exog, groups[, time, …])

Ordinal Response Marginal Regression Model using GEE

Generalized Linear Models

GLM(endog, exog[, family, offset, exposure, …])

Generalized Linear Models

GLMGam(endog[, exog, smoother, alpha, …])

Generalized Additive Models (GAM)

PoissonBayesMixedGLM(endog, exog, exog_vc, ident)

Generalized Linear Mixed Model with Bayesian estimation

Discrete and Count Models

GeneralizedPoisson(endog, exog[, p, offset, …])

Generalized Poisson Model

Logit(endog, exog, **kwargs)

Logit Model

MNLogit(endog, exog, **kwargs)

Multinomial Logit Model

Poisson(endog, exog[, offset, exposure, missing])

Poisson Model

Probit(endog, exog, **kwargs)

Probit Model

NegativeBinomial(endog, exog[, …])

Negative Binomial Model

NegativeBinomialP(endog, exog[, p, offset, …])

Generalized Negative Binomial (NB-P) Model

ZeroInflatedGeneralizedPoisson(endog, exog)

Zero Inflated Generalized Poisson Model

ZeroInflatedNegativeBinomialP(endog, exog[, …])

Zero Inflated Generalized Negative Binomial Model

ZeroInflatedPoisson(endog, exog[, …])

Poisson Zero Inflated Model

Multivariate Models

MANOVA(endog, exog[, missing, hasconst])

Multivariate Analysis of Variance

PCA(data[, ncomp, standardize, demean, …])

Principal Component Analysis

Misc Models

MixedLM(endog, exog, groups[, exog_re, …])

Linear Mixed Effects Model

PHReg(endog, exog[, status, entry, strata, …])

Cox Proportional Hazards Regression Model

QuantReg(endog, exog, **kwargs)

Quantile Regression

RLM(endog, exog[, M, missing])

Robust Linear Model

SurvfuncRight(time, status[, entry, title, …])

Estimation and inference for a survival function.

Graphics

ProbPlot(data[, dist, fit, distargs, a, …])

Q-Q and P-P Probability Plots

qqline(ax, line[, x, y, dist, fmt])

Plot a reference line for a qqplot.

qqplot(data[, dist, distargs, a, loc, …])

Q-Q plot of the quantiles of x versus the quantiles/ppf of a distribution.

qqplot_2samples(data1, data2[, xlabel, …])

Q-Q Plot of two samples’ quantiles.

Tools

test([extra_args, exit])

Run the test suite

add_constant(data[, prepend, has_constant])

Add a column of ones to an array.

categorical(data[, col, dictnames, drop])

Construct a dummy matrix from categorical variables

load_pickle(fname)

Load a previously saved object

show_versions([show_dirs])

List the versions of statsmodels and any installed dependencies

webdoc([func, stable])

Opens a browser and displays online documentation

statsmodels.tsa.api

Statistics and Tests

acf(x[, unbiased, nlags, qstat, fft, alpha, …])

Calculate the autocorrelation function.

acovf(x[, unbiased, demean, fft, missing, nlag])

Estimate autocovariances.

adfuller(x[, maxlag, regression, autolag, …])

Augmented Dickey-Fuller unit root test.

bds(x[, max_dim, epsilon, distance])

BDS Test Statistic for Independence of a Time Series

ccf(x, y[, unbiased])

The cross-correlation function.

ccovf(x, y[, unbiased, demean])

Calculate the crosscovariance between two series.

coint(y0, y1[, trend, method, maxlag, …])

Test for no-cointegration of a univariate equation.

kpss(x[, regression, nlags, store])

Kwiatkowski-Phillips-Schmidt-Shin test for stationarity.

pacf(x[, nlags, method, alpha])

Partial autocorrelation estimate.

pacf_ols(x[, nlags, efficient, unbiased])

Calculate partial autocorrelations via OLS.

pacf_yw(x[, nlags, method])

Partial autocorrelation estimated with non-recursive yule_walker.

periodogram(x)

Compute the periodogram for the natural frequency of x.

q_stat(x, nobs[, type])

Compute Ljung-Box Q Statistic.

Univariate Time-Series Analysis

AR(endog[, dates, freq, missing])

Autoregressive AR(p) model.

ARIMA(endog, order[, exog, dates, freq, missing])

Autoregressive Integrated Moving Average ARIMA(p,d,q) Model

ARMA(endog, order[, exog, dates, freq, missing])

Autoregressive Moving Average ARMA(p,q) Model

SARIMAX(endog[, exog, order, …])

Seasonal AutoRegressive Integrated Moving Average with eXogenous regressors model

arma_order_select_ic(y[, max_ar, max_ma, …])

Compute information criteria for many ARMA models.

arma_generate_sample(ar, ma, nsample[, …])

Simulate data from an ARMA.

ArmaProcess([ar, ma, nobs])

Theoretical properties of an ARMA process for specified lag-polynomials.

Exponential Smoothing

ExponentialSmoothing(endog[, trend, damped, …])

Holt Winter’s Exponential Smoothing

Holt(endog[, exponential, damped])

Holt’s Exponential Smoothing

SimpleExpSmoothing(endog)

Simple Exponential Smoothing

Multivariate Models

DynamicFactor(endog, k_factors, factor_order)

Dynamic factor model

VAR(endog[, exog, dates, freq, missing])

Fit VAR(p) process and do lag order selection

VARMAX(endog[, exog, order, trend, …])

Vector Autoregressive Moving Average with eXogenous regressors model

SVAR(endog, svar_type[, dates, freq, A, B, …])

Fit VAR and then estimate structural components of A and B, defined:

VECM(endog[, exog, exog_coint, dates, freq, …])

Class representing a Vector Error Correction Model (VECM).

UnobservedComponents(endog[, level, trend, …])

Univariate unobserved components time series model

Filters and Decompositions

seasonal_decompose(x[, model, filt, period, …])

Seasonal decomposition using moving averages.

STL(endog[, period, seasonal, trend, …])

Season-Trend decomposition using LOESS.

bkfilter(x[, low, high, K])

Filter a time series using the Baxter-King bandpass filter.

cffilter(x[, low, high, drift])

Christiano Fitzgerald asymmetric, random walk filter.

hpfilter(x[, lamb])

Hodrick-Prescott filter.

Markov Regime Switching Models

MarkovAutoregression(endog, k_regimes, order)

Markov switching regression model

MarkovRegression(endog, k_regimes[, trend, …])

First-order k-regime Markov switching regression model

Time-Series Tools

add_lag(x[, col, lags, drop, insert])

Returns an array with lags included given an array.

add_trend(x[, trend, prepend, has_constant])

Add a trend and/or constant to an array.

detrend(x[, order, axis])

Detrend an array with a trend of given order along axis 0 or 1.

lagmat(x, maxlag[, trim, original, use_pandas])

Create 2d array of lags.

lagmat2ds(x, maxlag0[, maxlagex, dropex, …])

Generate lagmatrix for 2d array, columns arranged by variables.

X12/X13 Interface

x13_arima_analysis(endog[, maxorder, …])

Perform x13-arima analysis for monthly or quarterly data.

x13_arima_select_order(endog[, maxorder, …])

Perform automatic seasonal ARIMA order identification using x12/x13 ARIMA.

statsmodels.formula.api

Models

The function descriptions of the methods exposed in the formula API are generic. See the documentation for the parent model for details.

gls(formula, data[, subset, drop_cols])

Create a Model from a formula and dataframe.

wls(formula, data[, subset, drop_cols])

Create a Model from a formula and dataframe.

ols(formula, data[, subset, drop_cols])

Create a Model from a formula and dataframe.

glsar(formula, data[, subset, drop_cols])

Create a Model from a formula and dataframe.

mixedlm(formula, data[, re_formula, …])

Create a Model from a formula and dataframe.

glm(formula, data[, subset, drop_cols])

Create a Model from a formula and dataframe.

rlm(formula, data[, subset, drop_cols])

Create a Model from a formula and dataframe.

mnlogit(formula, data[, subset, drop_cols])

Create a Model from a formula and dataframe.

logit(formula, data[, subset, drop_cols])

Create a Model from a formula and dataframe.

probit(formula, data[, subset, drop_cols])

Create a Model from a formula and dataframe.

poisson(formula, data[, subset, drop_cols])

Create a Model from a formula and dataframe.

negativebinomial(formula, data[, subset, …])

Create a Model from a formula and dataframe.

quantreg(formula, data[, subset, drop_cols])

Create a Model from a formula and dataframe.

phreg(formula, data[, status, entry, …])

Create a proportional hazards regression model from a formula and dataframe.

ordinal_gee(formula, groups, data[, subset, …])

Create a Model from a formula and dataframe.

nominal_gee(formula, groups, data[, subset, …])

Create a Model from a formula and dataframe.

gee(formula, groups, data[, subset, time, …])

Create a Model from a formula and dataframe.

glmgam(formula, data[, subset, drop_cols])

Create a Model from a formula and dataframe.

Import Paths and Structure

We offer two ways of importing functions and classes from statsmodels:

  1. API import for interactive use

    • Allows tab completion

  2. Direct import for programs

    • Avoids importing unnecessary modules and commands

API Import for interactive use

For interactive use the recommended import is:

import statsmodels.api as sm

Importing statsmodels.api will load most of the public parts of statsmodels. This makes most functions and classes conveniently available within one or two levels, without making the “sm” namespace too crowded.

To see what functions and classes available, you can type the following (or use the namespace exploration features of IPython, Spyder, IDLE, etc.):

>>> dir(sm)
['GLM', 'GLS', 'GLSAR', 'Logit', 'MNLogit', 'OLS', 'Poisson', 'Probit', 'RLM',
'WLS', '__builtins__', '__doc__', '__file__', '__name__', '__package__',
'add_constant', 'categorical', 'datasets', 'distributions', 'families',
'graphics', 'iolib', 'nonparametric', 'qqplot', 'regression', 'robust',
'stats', 'test', 'tools', 'tsa', 'version']

>>> dir(sm.graphics)
['__builtins__', '__doc__', '__file__', '__name__', '__package__',
'abline_plot', 'beanplot', 'fboxplot', 'interaction_plot', 'qqplot',
'rainbow', 'rainbowplot', 'violinplot']

>>> dir(sm.tsa)
['AR', 'ARMA', 'SVAR', 'VAR', '__builtins__', '__doc__',
'__file__', '__name__', '__package__', 'acf', 'acovf', 'add_lag',
'add_trend', 'adfuller', 'ccf', 'ccovf', 'datetools', 'detrend',
'filters', 'grangercausalitytests', 'interp', 'lagmat', 'lagmat2ds',
'pacf', 'pacf_ols', 'pacf_yw', 'periodogram', 'q_stat', 'stattools',
'tsatools', 'var']

Notes

The api modules may not include all the public functionality of statsmodels. If you find something that should be added to the api, please file an issue on github or report it to the mailing list.

The subpackages of statsmodels include api.py modules that are mainly intended to collect the imports needed for those subpackages. The subpackage/api.py files are imported into statsmodels api, for example

from .nonparametric import api as nonparametric

Users do not need to load the subpackage/api.py modules directly.

Direct import for programs

statsmodels submodules are arranged by topic (e.g. discrete for discrete choice models, or tsa for time series analysis). Our directory tree (stripped down) looks something like this:

statsmodels/
    __init__.py
    api.py
    discrete/
        __init__.py
        discrete_model.py
        tests/
            results/
    tsa/
        __init__.py
        api.py
        tsatools.py
        stattools.py
        arima_model.py
        arima_process.py
        vector_ar/
            __init__.py
            var_model.py
            tests/
                results/
        tests/
            results/
    stats/
        __init__.py
        api.py
        stattools.py
        tests/
    tools/
        __init__.py
        tools.py
        decorators.py
        tests/

The submodules that can be import heavy contain an empty __init__.py, except for some testing code for running tests for the submodules. The intention is to change all directories to have an api.py and empty __init__.py in the next release.

Import examples

Functions and classes:

from statsmodels.regression.linear_model import OLS, WLS
from statsmodels.tools.tools import rank, add_constant

Modules

from statsmodels.datasets import macrodata
import statsmodels.stats import diagnostic

Modules with aliases

import statsmodels.regression.linear_model as lm
import statsmodels.stats.diagnostic as smsdia
import statsmodels.stats.outliers_influence as oi

We do not have currently a convention for aliases of submodules.