See also changes in the unreleased 0.7
The main features of this release are several new time series models based on the statespace framework, multiple imputation using MICE as well as many other enhancements. The codebase also has been updated to be compatible with recent numpy and pandas releases.
statsmodels is using now github to store the updated documentation which is available under https://www.statsmodels.org/stable for the last release, and https://www.statsmodels.org/devel/ for the development version.
This is the last release that supports Python 2.6.
API stability is not guaranteed for new features, although even in this case changes will be made in a backwards compatible way if possible. The stability of a new feature depends on how much time it was already in statsmodels master and how much usage it has already seen. If there are specific known problems or limitations, then they are mentioned in the docstrings.
The following major new features appear in this version.
Building on the statespace framework and models added in 0.7, this release includes additional models that build on it. Authored by Chad Fulton largely during GSOC 2015
The Kalman smoother (introduced in #2434) allows making inference on the unobserved state vector at each point in time using data from the entire sample. In addition to this improved inference, the Kalman smoother is required for future improvements such as simulation smoothing and the expectation maximization (EM) algorithm.
As a result of this improvement, all state space models now inherit a smooth method for producing results with smoothed state estimates. In addition, the fit method will return results with smoothed estimates at the maximum likelihood estimates.
Improved post-estimation output is now available to all state space models (introduced in #2566). This includes the new methods get_prediction and get_forecast, providing standard errors and confidence intervals as well as point estimates, simulate, providing simulation of time series following the given state space process, and impulse_responses, allowing computation of impulse responses due to innovations to the state vector.
A number of general diagnostic tests on the residuals from state space estimation are now available to all state space models (introduced in #2431). These include:
test_normality implements the Jarque-Bera test for normality of residuals
test_heteroskedasticity implements a test for homoskedasticity of residuals similar to the Goldfeld-Quandt test
test_serial_correlation implements the Ljung-Box (or Box-Pierce) test for serial correlation of residuals
These test statistics are also now included in the summary method output. In addition, a plot_diagnostics method is available which provides four plots to visually assess model fit.
The class of univariate Unobserved Components models (also known as structural time series models) are now available (introduced in #2432). This includes as special cases the local level model and local linear trend model. Generically it allows decomposing a time series into trend, cycle, seasonal, and irregular components, optionally with exogenous regressors and / or autoregressive errors.
Two standard multivariate econometric models - vector autoregressive moving-average model with exogenous regressors (VARMAX) and Dynamic Factors models - are now available (introduced in #2563). The first is a popular reduced form method of exploring the covariance in several time series, and the second is a popular reduced form method of extracting a small number of common factors from a large dataset of observed series.
Recursive least squares¶
A model for recursive least squares, also known as expanding-window OLS, is now available in statsmodels.regression (introduced in #2830).
Other improvements to the state space framework include:
Improved missing data handling #2770, #2809
Ongoing refactoring and bug fixes in fringes and corner cases
Time Series Analysis¶
Markov Switching Models¶
Markov switching dynamic regression and autoregression models are now available (introduced in #2980 by Chad Fulton). These models allow regression effects and / or autoregressive dynamics to differ depending on an unobserved “regime”; in Markov switching models, the regimes are assumed to transition according to a Markov process.
KPSS stationarity, unit root test #2775 (N-Wouda)
The Brock Dechert Scheinkman (BDS) test for nonlinear dependence is now available (introduced in #934 by Chad Fulton)
Augmented Engle/Granger cointegration test (refactor hidden function) #3146 (Josef Perktold)
New functionality in statistics¶
Contingency Tables #2418 (Kerby Shedden)
Local FDR, multiple testing #2297 (Kerby Shedden)
Mediation Analysis #2352 (Kerby Shedden)
Confidence intervals for multinomial proportions #3162 (Sebastien Lerique, Josef Perktold)
weighted quantiles in DescrStatsW #2707 (Kerby Shedden)
Kaplan Meier Survival Function #2614 (Kerby Shedden)
Cumulative incidence rate function #3016 (Kerby Shedden)
frequency weights in Kaplan-Meier #2992 (Kerby Shedden)
entry times for Kaplan-Meier #3126 (Kerby Shedden)
intercept handling for PHReg #3095 (Kerby Shedden)
new subpackage in statsmodels.imputation
MICE #2076 (Frank Cheng GSOC 2014 and Kerby Shedden)
Imputation by regression on Order Statistic #3019 (Paul Hobson)
Elastic net: fit_regularized with L1/L2 penalization has been added to OLS, GLM and PHReg (Kerby Shedden)
Tweedie is now available as new family #2872 (Peter Quackenbush, Josef Perktold)
frequency weights for GLM (currently without full support) #
more flexible convergence options #2803 (Peter Quackenbush)
new subpackage that currently contains PCA
PCA was added in 0.7 to statsmodels.tools and is now in statsmodels.multivariate
New doc build with latest jupyter and Python 3 compatibility (Tom Augspurger)
Other important improvements¶
several existing functions have received improvements
seasonal_decompose: improved periodicity handling #2987 (ssktotoro ?)
tools add_constant, add_trend: refactoring and pandas compatibility #2240 (Kevin Sheppard)
acf, pacf, acovf: option for missing handling #3020 (joesnacks ?)
acf, pacf plots: allow array of lags #2989 (Kevin Sheppard)
pickling support for ARIMA #3412 (zaemyung)
io SimpleTable (summary): allow names with special characters #3015 (tvanessa ?)
tsa tools lagmat, lagmat2ds: pandas support #2310 #3042 (Kevin Sheppard)
CompareMeans: from_data, summary methods #2754 (Valery Tyumen)
API cleanup for robust, sandwich covariances #3162 (Josef Perktold)
influence plot used swapped arguments (bug) #3158
Major Bugs fixed¶
see github issues
While most bugs are usability problems, there is now a new label type-bug-wrong for bugs that cause that silently incorrect numbers are returned. https://github.com/statsmodels/statsmodels/issues?q=label%3Atype-bug-wrong+is%3Aclosed
Backwards incompatible changes and deprecations¶
predict now returns a pandas Series if the exog argument is a DataFrame, including missing/NaN values
PCA moved to multivariate compared to 0.7
Development summary and credits¶
Besides receiving contributions for new and improved features and for bugfixes, important contributions to general maintenance came from
Pierre Barbier de Reuille
and the general maintainer and code reviewer
Additionally, many users contributed by participation in github issues and providing feedback.
Thanks to all of the contributors for the 0.8 release (based on git log):
Christoph T. Weidemann
Pierre Barbier de Reuille
Soren Fuglede Jorgensen
Thanks to all of the contributors for the 0.7 release:
Hans-Martin von Gaudecker
Louis-Philippe Lemieux Perreault
These lists of names are automatically generated based on git log, and may not be complete.