{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Distributed Estimation \n", "\n", "This notebook goes through a couple of examples to show how to use `distributed_estimation`. We import the `DistributedModel` class and make the exog and endog generators." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "execution": { "iopub.execute_input": "2024-04-19T16:16:45.327061Z", "iopub.status.busy": "2024-04-19T16:16:45.326771Z", "iopub.status.idle": "2024-04-19T16:16:46.380619Z", "shell.execute_reply": "2024-04-19T16:16:46.379771Z" } }, "outputs": [], "source": [ "import numpy as np\n", "from scipy.stats.distributions import norm\n", "from statsmodels.base.distributed_estimation import DistributedModel\n", "\n", "\n", "def _exog_gen(exog, partitions):\n", " \"\"\"partitions exog data\"\"\"\n", "\n", " n_exog = exog.shape[0]\n", " n_part = np.ceil(n_exog / partitions)\n", "\n", " ii = 0\n", " while ii < n_exog:\n", " jj = int(min(ii + n_part, n_exog))\n", " yield exog[ii:jj, :]\n", " ii += int(n_part)\n", "\n", "\n", "def _endog_gen(endog, partitions):\n", " \"\"\"partitions endog data\"\"\"\n", "\n", " n_endog = endog.shape[0]\n", " n_part = np.ceil(n_endog / partitions)\n", "\n", " ii = 0\n", " while ii < n_endog:\n", " jj = int(min(ii + n_part, n_endog))\n", " yield endog[ii:jj]\n", " ii += int(n_part)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next we generate some random data to serve as an example." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "execution": { "iopub.execute_input": "2024-04-19T16:16:46.386272Z", "iopub.status.busy": "2024-04-19T16:16:46.384878Z", "iopub.status.idle": "2024-04-19T16:16:46.406357Z", "shell.execute_reply": "2024-04-19T16:16:46.405609Z" } }, "outputs": [], "source": [ "X = np.random.normal(size=(1000, 25))\n", "beta = np.random.normal(size=25)\n", "beta *= np.random.randint(0, 2, size=25)\n", "y = norm.rvs(loc=X.dot(beta))\n", "m = 5" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is the most basic fit, showing all of the defaults, which are to use OLS as the model class, and the debiasing procedure." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "execution": { "iopub.execute_input": "2024-04-19T16:16:46.411946Z", "iopub.status.busy": "2024-04-19T16:16:46.410281Z", "iopub.status.idle": "2024-04-19T16:16:46.740081Z", "shell.execute_reply": "2024-04-19T16:16:46.739196Z" } }, "outputs": [], "source": [ "debiased_OLS_mod = DistributedModel(m)\n", "debiased_OLS_fit = debiased_OLS_mod.fit(\n", " zip(_endog_gen(y, m), _exog_gen(X, m)), fit_kwds={\"alpha\": 0.2}\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Then we run through a slightly more complicated example which uses the GLM model class." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "execution": { "iopub.execute_input": "2024-04-19T16:16:46.744842Z", "iopub.status.busy": "2024-04-19T16:16:46.743687Z", "iopub.status.idle": "2024-04-19T16:16:47.267363Z", "shell.execute_reply": "2024-04-19T16:16:47.266476Z" } }, "outputs": [], "source": [ "from statsmodels.genmod.generalized_linear_model import GLM\n", "from statsmodels.genmod.families import Gaussian\n", "\n", "debiased_GLM_mod = DistributedModel(\n", " m, model_class=GLM, init_kwds={\"family\": Gaussian()}\n", ")\n", "debiased_GLM_fit = debiased_GLM_mod.fit(\n", " zip(_endog_gen(y, m), _exog_gen(X, m)), fit_kwds={\"alpha\": 0.2}\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also change the `estimation_method` and the `join_method`. The below example show how this works for the standard OLS case. Here we using a naive averaging approach instead of the debiasing procedure." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "execution": { "iopub.execute_input": "2024-04-19T16:16:47.271316Z", "iopub.status.busy": "2024-04-19T16:16:47.270808Z", "iopub.status.idle": "2024-04-19T16:16:47.406447Z", "shell.execute_reply": "2024-04-19T16:16:47.405671Z" } }, "outputs": [], "source": [ "from statsmodels.base.distributed_estimation import _est_regularized_naive, _join_naive\n", "\n", "\n", "naive_OLS_reg_mod = DistributedModel(\n", " m, estimation_method=_est_regularized_naive, join_method=_join_naive\n", ")\n", "naive_OLS_reg_params = naive_OLS_reg_mod.fit(\n", " zip(_endog_gen(y, m), _exog_gen(X, m)), fit_kwds={\"alpha\": 0.2}\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, we can also change the `results_class` used. The following example shows how this work for a simple case with an unregularized model and naive averaging." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "execution": { "iopub.execute_input": "2024-04-19T16:16:47.410813Z", "iopub.status.busy": "2024-04-19T16:16:47.410264Z", "iopub.status.idle": "2024-04-19T16:16:47.420597Z", "shell.execute_reply": "2024-04-19T16:16:47.419829Z" } }, "outputs": [], "source": [ "from statsmodels.base.distributed_estimation import (\n", " _est_unregularized_naive,\n", " DistributedResults,\n", ")\n", "\n", "\n", "naive_OLS_unreg_mod = DistributedModel(\n", " m,\n", " estimation_method=_est_unregularized_naive,\n", " join_method=_join_naive,\n", " results_class=DistributedResults,\n", ")\n", "naive_OLS_unreg_params = naive_OLS_unreg_mod.fit(\n", " zip(_endog_gen(y, m), _exog_gen(X, m)), fit_kwds={\"alpha\": 0.2}\n", ")" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.14" } }, "nbformat": 4, "nbformat_minor": 4 }