class statsmodels.nonparametric.kernel_density.KDEMultivariate(data, var_type, bw=None, defaults=<statsmodels.nonparametric._kernel_base.EstimatorSettings object>)[source]

Multivariate kernel density estimator.

This density estimator can handle univariate as well as multivariate data, including mixed continuous / ordered discrete / unordered discrete data. It also provides cross-validated bandwidth selection methods (least squares, maximum likelihood).


data: list of ndarrays or 2-D ndarray

The training data for the Kernel Density Estimation, used to determine the bandwidth(s). If a 2-D array, should be of shape (num_observations, num_variables). If a list, each list element is a separate observation.

var_type: str

The type of the variables:

  • c : continuous
  • u : unordered (discrete)
  • o : ordered (discrete)

The string should contain a type specifier for each variable, so for example var_type='ccuo'.

bw: array_like or str, optional

If an array, it is a fixed user-specified bandwidth. If a string, should be one of:

  • normal_reference: normal reference rule of thumb (default)
  • cv_ml: cross validation maximum likelihood
  • cv_ls: cross validation least squares

defaults: EstimatorSettings instance, optional

The default values for (efficient) bandwidth estimation.


>>> import statsmodels.api as sm
>>> nobs = 300
>>> np.random.seed(1234)  # Seed random generator
>>> c1 = np.random.normal(size=(nobs,1))
>>> c2 = np.random.normal(2, 1, size=(nobs,1))

Estimate a bivariate distribution and display the bandwidth found:

>>> dens_u = sm.nonparametric.KDEMultivariate(data=[c1,c2],
...     var_type='cc', bw='normal_reference')
array([ 0.39967419,  0.38423292])


bw: array_like The bandwidth parameters.


cdf([data_predict]) Evaluate the cumulative distribution function.
imse(bw) Returns the Integrated Mean Square Error for the unconditional KDE.
loo_likelihood(bw[, func]) Returns the leave-one-out likelihood function.
pdf([data_predict]) Evaluate the probability density function.