{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Distributed Estimation \n", "\n", "This notebook goes through a couple of examples to show how to use `distributed_estimation`. We import the `DistributedModel` class and make the exog and endog generators." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "execution": { "iopub.execute_input": "2024-03-18T09:27:05.852182Z", "iopub.status.busy": "2024-03-18T09:27:05.851873Z", "iopub.status.idle": "2024-03-18T09:27:07.125139Z", "shell.execute_reply": "2024-03-18T09:27:07.124225Z" } }, "outputs": [], "source": [ "import numpy as np\n", "from scipy.stats.distributions import norm\n", "from statsmodels.base.distributed_estimation import DistributedModel\n", "\n", "\n", "def _exog_gen(exog, partitions):\n", " \"\"\"partitions exog data\"\"\"\n", "\n", " n_exog = exog.shape[0]\n", " n_part = np.ceil(n_exog / partitions)\n", "\n", " ii = 0\n", " while ii < n_exog:\n", " jj = int(min(ii + n_part, n_exog))\n", " yield exog[ii:jj, :]\n", " ii += int(n_part)\n", "\n", "\n", "def _endog_gen(endog, partitions):\n", " \"\"\"partitions endog data\"\"\"\n", "\n", " n_endog = endog.shape[0]\n", " n_part = np.ceil(n_endog / partitions)\n", "\n", " ii = 0\n", " while ii < n_endog:\n", " jj = int(min(ii + n_part, n_endog))\n", " yield endog[ii:jj]\n", " ii += int(n_part)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next we generate some random data to serve as an example." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "execution": { "iopub.execute_input": "2024-03-18T09:27:07.132889Z", "iopub.status.busy": "2024-03-18T09:27:07.130760Z", "iopub.status.idle": "2024-03-18T09:27:07.144196Z", "shell.execute_reply": "2024-03-18T09:27:07.143379Z" } }, "outputs": [], "source": [ "X = np.random.normal(size=(1000, 25))\n", "beta = np.random.normal(size=25)\n", "beta *= np.random.randint(0, 2, size=25)\n", "y = norm.rvs(loc=X.dot(beta))\n", "m = 5" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is the most basic fit, showing all of the defaults, which are to use OLS as the model class, and the debiasing procedure." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "execution": { "iopub.execute_input": "2024-03-18T09:27:07.150693Z", "iopub.status.busy": "2024-03-18T09:27:07.148772Z", "iopub.status.idle": "2024-03-18T09:27:07.559201Z", "shell.execute_reply": "2024-03-18T09:27:07.558294Z" } }, "outputs": [], "source": [ "debiased_OLS_mod = DistributedModel(m)\n", "debiased_OLS_fit = debiased_OLS_mod.fit(\n", " zip(_endog_gen(y, m), _exog_gen(X, m)), fit_kwds={\"alpha\": 0.2}\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Then we run through a slightly more complicated example which uses the GLM model class." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "execution": { "iopub.execute_input": "2024-03-18T09:27:07.563729Z", "iopub.status.busy": "2024-03-18T09:27:07.563416Z", "iopub.status.idle": "2024-03-18T09:27:08.485303Z", "shell.execute_reply": "2024-03-18T09:27:08.473111Z" } }, "outputs": [], "source": [ "from statsmodels.genmod.generalized_linear_model import GLM\n", "from statsmodels.genmod.families import Gaussian\n", "\n", "debiased_GLM_mod = DistributedModel(\n", " m, model_class=GLM, init_kwds={\"family\": Gaussian()}\n", ")\n", "debiased_GLM_fit = debiased_GLM_mod.fit(\n", " zip(_endog_gen(y, m), _exog_gen(X, m)), fit_kwds={\"alpha\": 0.2}\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also change the `estimation_method` and the `join_method`. The below example show how this works for the standard OLS case. Here we using a naive averaging approach instead of the debiasing procedure." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "execution": { "iopub.execute_input": "2024-03-18T09:27:08.489760Z", "iopub.status.busy": "2024-03-18T09:27:08.489385Z", "iopub.status.idle": "2024-03-18T09:27:08.749789Z", "shell.execute_reply": "2024-03-18T09:27:08.749010Z" } }, "outputs": [], "source": [ "from statsmodels.base.distributed_estimation import _est_regularized_naive, _join_naive\n", "\n", "\n", "naive_OLS_reg_mod = DistributedModel(\n", " m, estimation_method=_est_regularized_naive, join_method=_join_naive\n", ")\n", "naive_OLS_reg_params = naive_OLS_reg_mod.fit(\n", " zip(_endog_gen(y, m), _exog_gen(X, m)), fit_kwds={\"alpha\": 0.2}\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, we can also change the `results_class` used. The following example shows how this work for a simple case with an unregularized model and naive averaging." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "execution": { "iopub.execute_input": "2024-03-18T09:27:08.754210Z", "iopub.status.busy": "2024-03-18T09:27:08.753776Z", "iopub.status.idle": "2024-03-18T09:27:08.768646Z", "shell.execute_reply": "2024-03-18T09:27:08.767953Z" } }, "outputs": [], "source": [ "from statsmodels.base.distributed_estimation import (\n", " _est_unregularized_naive,\n", " DistributedResults,\n", ")\n", "\n", "\n", "naive_OLS_unreg_mod = DistributedModel(\n", " m,\n", " estimation_method=_est_unregularized_naive,\n", " join_method=_join_naive,\n", " results_class=DistributedResults,\n", ")\n", "naive_OLS_unreg_params = naive_OLS_unreg_mod.fit(\n", " zip(_endog_gen(y, m), _exog_gen(X, m)), fit_kwds={\"alpha\": 0.2}\n", ")" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.13" } }, "nbformat": 4, "nbformat_minor": 4 }