statsmodels.base.distributed_estimation.DistributedModel

class statsmodels.base.distributed_estimation.DistributedModel(partitions, model_class=None, init_kwds=None, estimation_method=None, estimation_kwds=None, join_method=None, join_kwds=None, results_class=None, results_kwds=None)[source]

Distributed model class

Parameters
partitionsscalar

The number of partitions that the data will be split into.

model_classstatsmodels model class

The model class which will be used for estimation. If None this defaults to OLS.

init_kwdsdict-like or None

Keywords needed for initializing the model, in addition to endog and exog.

init_kwds_generatorgenerator or None

Additional keyword generator that produces model init_kwds that may vary based on data partition. The current usecase is for WLS and GLS

estimation_methodfunction or None

The method that performs the estimation for each partition. If None this defaults to _est_regularized_debiased.

estimation_kwdsdict-like or None

Keywords to be passed to estimation_method.

join_methodfunction or None

The method used to recombine the results from each partition. If None this defaults to _join_debiased.

join_kwdsdict-like or None

Keywords to be passed to join_method.

results_classresults class or None

The class of results that should be returned. If None this defaults to RegularizedResults.

results_kwdsdict-like or None

Keywords to be passed to results class.

Attributes
partitionsscalar

See Parameters.

model_classstatsmodels model class

See Parameters.

init_kwdsdict-like

See Parameters.

init_kwds_generatorgenerator or None

See Parameters.

estimation_methodfunction

See Parameters.

estimation_kwdsdict-like

See Parameters.

join_methodfunction

See Parameters.

join_kwdsdict-like

See Parameters.

results_classresults class

See Parameters.

results_kwdsdict-like

See Parameters.

Methods

fit(data_generator[, fit_kwds, …])

Performs the distributed estimation using the corresponding DistributedModel

fit_joblib(data_generator, fit_kwds, …[, …])

Performs the distributed estimation in parallel using joblib

fit_sequential(data_generator, fit_kwds[, …])

Sequentially performs the distributed estimation using the corresponding DistributedModel