statsmodels.imputation.mice.MICEData

class statsmodels.imputation.mice.MICEData(data, perturbation_method='gaussian', k_pmm=20, history_callback=None)[source]

Wrap a data set to allow missing data handling with MICE.

Parameters
  • data (Pandas data frame) – The data set, whch is copied internally.

  • perturbation_method (string) – The default perturbation method

  • k_pmm (int) – The number of nearest neighbors to use during predictive mean matching. Can also be specified in fit.

  • history_callback (function) – A function that is called after each complete imputation cycle. The return value is appended to history. The MICEData object is passed as the sole argument to history_callback.

Examples

Draw 20 imputations from a data set called data and save them in separate files with filename pattern dataXX.csv. The variables other than x1 are imputed using linear models fit with OLS, with mean structures containing main effects of all other variables in data. The variable named x1 has a condtional mean structure that includes an additional term for x2^2.

>>> imp = mice.MICEData(data)
>>> imp.set_imputer('x1', formula='x2 + np.square(x2) + x3')
>>> for j in range(20):
...     imp.update_all()
...     imp.data.to_csv('data%02d.csv' % j)

Impute using default models, using the MICEData object as an iterator.

>>> imp = mice.MICEData(data)
>>> j = 0
>>> for data in imp:
...     imp.data.to_csv('data%02d.csv' % j)
...     j += 1

Notes

Allowed perturbation methods are ‘gaussian’ (the model parameters are set to a draw from the Gaussian approximation to the posterior distribution), and ‘boot’ (the model parameters are set to the estimated values obtained when fitting a bootstrapped version of the data set).

history_callback can be implemented to have side effects such as saving the current imputed data set to disk.

Methods

get_fitting_data(vname)

Return the data needed to fit a model for imputation.

get_split_data(vname)

Return endog and exog for imputation of a given variable.

impute(vname)

impute_pmm(vname)

Use predictive mean matching to impute missing values.

next_sample()

Returns the next imputed dataset in the imputation process.

perturb_params(vname)

plot_bivariate(col1_name, col2_name[, …])

Plot observed and imputed values for two variables.

plot_fit_obs(col_name[, lowess_args, …])

Plot fitted versus imputed or observed values as a scatterplot.

plot_imputed_hist(col_name[, ax, …])

Display imputed values for one variable as a histogram.

plot_missing_pattern([ax, row_order, …])

Generate an image showing the missing data pattern.

set_imputer(endog_name[, formula, …])

Specify the imputation process for a single variable.

update(vname)

Impute missing values for a single variable.

update_all([n_iter])

Perform a specified number of MICE iterations.