class statsmodels.nonparametric.kernel_density.KDEMultivariate(data, var_type, bw=None, defaults=None)[source]

Multivariate kernel density estimator.

This density estimator can handle univariate as well as multivariate data, including mixed continuous / ordered discrete / unordered discrete data. It also provides cross-validated bandwidth selection methods (least squares, maximum likelihood).

  • data (list of ndarrays or 2-D ndarray) – The training data for the Kernel Density Estimation, used to determine the bandwidth(s). If a 2-D array, should be of shape (num_observations, num_variables). If a list, each list element is a separate observation.
  • var_type (str) –

    The type of the variables:

    • c : continuous
    • u : unordered (discrete)
    • o : ordered (discrete)

    The string should contain a type specifier for each variable, so for example var_type='ccuo'.

  • bw (array_like or str, optional) –

    If an array, it is a fixed user-specified bandwidth. If a string, should be one of:

    • normal_reference: normal reference rule of thumb (default)
    • cv_ml: cross validation maximum likelihood
    • cv_ls: cross validation least squares
  • defaults (EstimatorSettings instance, optional) – The default values for (efficient) bandwidth estimation.

The bandwidth parameters.



>>> import statsmodels.api as sm
>>> nobs = 300
>>> np.random.seed(1234)  # Seed random generator
>>> c1 = np.random.normal(size=(nobs,1))
>>> c2 = np.random.normal(2, 1, size=(nobs,1))

Estimate a bivariate distribution and display the bandwidth found:

>>> dens_u = sm.nonparametric.KDEMultivariate(data=[c1,c2],
...     var_type='cc', bw='normal_reference')
array([ 0.39967419,  0.38423292])


cdf([data_predict]) Evaluate the cumulative distribution function.
imse(bw) Returns the Integrated Mean Square Error for the unconditional KDE.
loo_likelihood(bw[, func]) Returns the leave-one-out likelihood function.
pdf([data_predict]) Evaluate the probability density function.