.. currentmodule:: statsmodels.nonparametric
.. _nonparametric:
Nonparametric Methods :mod:`nonparametric`
==========================================
This section collects various methods in nonparametric statistics. This
includes kernel density estimation for univariate and multivariate data,
kernel regression and locally weighted scatterplot smoothing (lowess).
sandbox.nonparametric contains additional functions that are work in progress
or don't have unit tests yet. We are planning to include here nonparametric
density estimators, especially based on kernel or orthogonal polynomials,
smoothers, and tools for nonparametric models and methods in other parts of
statsmodels.
Kernel density estimation
-------------------------
The kernel density estimation (KDE) functionality is split between univariate
and multivariate estimation, which are implemented in quite different ways.
Univariate estimation (as provided by `KDEUnivariate`) uses FFT transforms,
which makes it quite fast. Therefore it should be preferred for *continuous,
univariate* data if speed is important. It supports using different kernels;
bandwidth estimation is done only by a rule of thumb (Scott or Silverman).
Multivariate estimation (as provided by `KDEMultivariate`) uses product
kernels. It supports least squares and maximum likelihood cross-validation
for bandwidth estimation, as well as estimating mixed continuous, ordered and
unordered data. The default kernels (Gaussian, Wang-Ryzin and
Aitchison-Aitken) cannot be altered at the moment however. Direct estimation
of the conditional density (:math:`P(X | Y) = P(X, Y) / P(Y)`) is supported
by `KDEMultivariateConditional`.
`KDEMultivariate` can do univariate estimation as well, but is up to two orders
of magnitude slower than `KDEUnivariate`.
Kernel regression
-----------------
Kernel regression (as provided by `KernelReg`) is based on the same product
kernel approach as `KDEMultivariate`, and therefore has the same set of
features (mixed data, cross-validated bandwidth estimation, kernels) as
described above for `KDEMultivariate`. Censored regression is provided by
`KernelCensoredReg`.
Note that code for semi-parametric partial linear models and single index
models, based on `KernelReg`, can be found in the sandbox.
References
----------
* B.W. Silverman, "Density Estimation for Statistics and Data Analysis"
* J.S. Racine, "Nonparametric Econometrics: A Primer," Foundation and
Trends in Econometrics, Vol. 3, No. 1, pp. 1-88, 2008.
* Q. Li and J.S. Racine, "Nonparametric econometrics: theory and practice",
Princeton University Press, 2006.
* Hastie, Tibshirani and Friedman, "The Elements of Statistical Learning:
Data Mining, Inference, and Prediction", Springer, 2009.
* Racine, J., Li, Q. "Nonparametric Estimation of Distributions
with Categorical and Continuous Data." Working Paper. (2000)
* Racine, J. Li, Q. "Kernel Estimation of Multivariate Conditional
Distributions Annals of Economics and Finance 5, 211-235 (2004)
* Liu, R., Yang, L. "Kernel estimation of multivariate
cumulative distribution function." Journal of Nonparametric Statistics
(2008)
* Li, R., Ju, G. "Nonparametric Estimation of Multivariate CDF
with Categorical and Continuous Data." Working Paper
* Li, Q., Racine, J. "Cross-validated local linear nonparametric
regression" Statistica Sinica 14(2004), pp. 485-512
* Racine, J.: "Consistent Significance Testing for Nonparametric
Regression" Journal of Business & Economics Statistics
* Racine, J., Hart, J., Li, Q., "Testing the Significance of
Categorical Predictor Variables in Nonparametric Regression
Models", 2006, Econometric Reviews 25, 523-544
Module Reference
----------------
.. module:: statsmodels.nonparametric
:synopsis: Nonparametric estimation of densities and curves
The public functions and classes are
.. autosummary::
:toctree: generated/
smoothers_lowess.lowess
kde.KDEUnivariate
kernel_density.KDEMultivariate
kernel_density.KDEMultivariateConditional
kernel_density.EstimatorSettings
kernel_regression.KernelReg
kernel_regression.KernelCensoredReg
helper functions for kernel bandwidths
.. autosummary::
:toctree: generated/
bandwidths.bw_scott
bandwidths.bw_silverman
bandwidths.select_bandwidth
There are some examples for nonlinear functions in
:mod:`statsmodels.nonparametric.dgp_examples`
The sandbox.nonparametric contains additional insufficiently tested classes
for testing functional form and for semi-linear and single index models.