# Nonparametric Methods nonparametric¶

This section collects various methods in nonparametric statistics. This includes kernel density estimation for univariate and multivariate data, kernel regression and locally weighted scatterplot smoothing (lowess).

sandbox.nonparametric contains additional functions that are work in progress or do not have unit tests yet. We are planning to include here nonparametric density estimators, especially based on kernel or orthogonal polynomials, smoothers, and tools for nonparametric models and methods in other parts of statsmodels.

## Kernel density estimation¶

The kernel density estimation (KDE) functionality is split between univariate and multivariate estimation, which are implemented in quite different ways.

Univariate estimation (as provided by KDEUnivariate) uses FFT transforms, which makes it quite fast. Therefore it should be preferred for continuous, univariate data if speed is important. It supports using different kernels; bandwidth estimation is done only by a rule of thumb (Scott or Silverman).

Multivariate estimation (as provided by KDEMultivariate) uses product kernels. It supports least squares and maximum likelihood cross-validation for bandwidth estimation, as well as estimating mixed continuous, ordered and unordered data. The default kernels (Gaussian, Wang-Ryzin and Aitchison-Aitken) cannot be altered at the moment however. Direct estimation of the conditional density ($$P(X | Y) = P(X, Y) / P(Y)$$) is supported by KDEMultivariateConditional.

KDEMultivariate can do univariate estimation as well, but is up to two orders of magnitude slower than KDEUnivariate.

## Kernel regression¶

Kernel regression (as provided by KernelReg) is based on the same product kernel approach as KDEMultivariate, and therefore has the same set of features (mixed data, cross-validated bandwidth estimation, kernels) as described above for KDEMultivariate. Censored regression is provided by KernelCensoredReg.

Note that code for semi-parametric partial linear models and single index models, based on KernelReg, can be found in the sandbox.

## Module Reference¶

The public functions and classes are

 lowess(endog, exog[, frac, it, delta, ...]) LOWESS (Locally Weighted Scatterplot Smoothing)
 KDEUnivariate(endog) Univariate Kernel Density Estimator.
 KDEMultivariate(data, var_type[, bw, defaults]) Multivariate kernel density estimator. KDEMultivariateConditional(endog, exog, ...) Conditional multivariate kernel density estimator. EstimatorSettings([efficient, randomize, ...]) Object to specify settings for density estimation or regression.
 KernelReg(endog, exog, var_type[, reg_type, ...]) Nonparametric kernel regression class. KernelCensoredReg(endog, exog, var_type, ...) Nonparametric censored regression.

helper functions for kernel bandwidths

 bw_scott(x[, kernel]) Scott's Rule of Thumb bw_silverman(x[, kernel]) Silverman's Rule of Thumb select_bandwidth(x, bw, kernel) Selects bandwidth for a selection rule bw

There are some examples for nonlinear functions in statsmodels.nonparametric.dgp_examples

## Asymmetric Kernels¶

Asymmetric kernels like beta for the unit interval and gamma for positive valued random variables avoid problems at the boundary of the support of the distribution.

Statsmodels has preliminary support for estimating density and cumulative distribution function using kernels for the unit interval, beta or the positive real line, all other kernels.

Several of the kernels for the positive real line assume that the density at the zero boundary is zero. The gamma kernel also allows the case of positive or unbound density at the zero boundary.

There are currently no defaults and no support for choosing the bandwidth. the user has to provide the bandwidth.

The functions to compute kernel density and kernel cdf are

 pdf_kernel_asym(x, sample, bw, kernel_type) Density estimate based on asymmetric kernel. cdf_kernel_asym(x, sample, bw, kernel_type) Estimate of cumulative distribution based on asymmetric kernel.

The available kernel functions for pdf and cdf are

 kernel_pdf_beta(x, sample, bw) Beta kernel for density, pdf, estimation. kernel_pdf_beta2(x, sample, bw) Beta kernel for density, pdf, estimation with boundary corrections. kernel_pdf_bs(x, sample, bw) Birnbaum Saunders (normal) kernel for density, pdf, estimation. kernel_pdf_gamma(x, sample, bw) Gamma kernel for density, pdf, estimation. kernel_pdf_gamma2(x, sample, bw) Gamma kernel for density, pdf, estimation with boundary correction. kernel_pdf_invgamma(x, sample, bw) Inverse gamma kernel for density, pdf, estimation. kernel_pdf_invgauss(x, sample, bw) Inverse gaussian kernel for density, pdf, estimation. kernel_pdf_lognorm(x, sample, bw) Log-normal kernel for density, pdf, estimation. kernel_pdf_recipinvgauss(x, sample, bw) Reciprocal inverse gaussian kernel for density, pdf, estimation. kernel_pdf_weibull(x, sample, bw) Weibull kernel for density, pdf, estimation. kernel_cdf_beta(x, sample, bw) Beta kernel for cumulative distribution, cdf, estimation. kernel_cdf_beta2(x, sample, bw) Beta kernel for cdf estimation with boundary correction. kernel_cdf_bs(x, sample, bw) Birnbaum Saunders (normal) kernel for cdf estimation. kernel_cdf_gamma(x, sample, bw) Gamma kernel for cumulative distribution, cdf, estimation. kernel_cdf_gamma2(x, sample, bw) Gamma kernel for cdf estimation with boundary correction. kernel_cdf_invgamma(x, sample, bw) Inverse gamma kernel for cumulative distribution, cdf, estimation. kernel_cdf_invgauss(x, sample, bw) Inverse gaussian kernel for cumulative distribution, cdf, estimation. kernel_cdf_lognorm(x, sample, bw) Log-normal kernel for cumulative distribution, cdf, estimation. kernel_cdf_recipinvgauss(x, sample, bw) Reciprocal inverse gaussian kernel for cdf estimation. kernel_cdf_weibull(x, sample, bw) Weibull kernel for cumulative distribution, cdf, estimation.

The sandbox.nonparametric contains additional insufficiently tested classes for testing functional form and for semi-linear and single index models.