{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## GEE nested covariance structure simulation study\n", "\n", "This notebook is a simulation study that illustrates and evaluates the performance of the GEE nested covariance structure.\n", "\n", "A nested covariance structure is based on a nested sequence of groups, or \"levels\". The top level in the hierarchy is defined by the `groups` argument to GEE. Subsequent levels are defined by the `dep_data` argument to GEE." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "execution": { "iopub.execute_input": "2023-12-14T14:40:15.623548Z", "iopub.status.busy": "2023-12-14T14:40:15.623300Z", "iopub.status.idle": "2023-12-14T14:40:18.120305Z", "shell.execute_reply": "2023-12-14T14:40:18.119032Z" } }, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "import statsmodels.api as sm" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Set the number of covariates." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "execution": { "iopub.execute_input": "2023-12-14T14:40:18.125214Z", "iopub.status.busy": "2023-12-14T14:40:18.124482Z", "iopub.status.idle": "2023-12-14T14:40:18.131576Z", "shell.execute_reply": "2023-12-14T14:40:18.130744Z" } }, "outputs": [], "source": [ "p = 5" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "These parameters define the population variance for each level of grouping." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "execution": { "iopub.execute_input": "2023-12-14T14:40:18.135855Z", "iopub.status.busy": "2023-12-14T14:40:18.134846Z", "iopub.status.idle": "2023-12-14T14:40:18.142599Z", "shell.execute_reply": "2023-12-14T14:40:18.141654Z" } }, "outputs": [], "source": [ "groups_var = 1\n", "level1_var = 2\n", "level2_var = 3\n", "resid_var = 4" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Set the number of groups" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "execution": { "iopub.execute_input": "2023-12-14T14:40:18.146908Z", "iopub.status.busy": "2023-12-14T14:40:18.145894Z", "iopub.status.idle": "2023-12-14T14:40:18.152301Z", "shell.execute_reply": "2023-12-14T14:40:18.151554Z" } }, "outputs": [], "source": [ "n_groups = 100" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Set the number of observations at each level of grouping. Here, everything is balanced, i.e. within a level every group has the same size." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "execution": { "iopub.execute_input": "2023-12-14T14:40:18.156199Z", "iopub.status.busy": "2023-12-14T14:40:18.155339Z", "iopub.status.idle": "2023-12-14T14:40:18.161640Z", "shell.execute_reply": "2023-12-14T14:40:18.160970Z" } }, "outputs": [], "source": [ "group_size = 20\n", "level1_size = 10\n", "level2_size = 5" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Calculate the total sample size." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "execution": { "iopub.execute_input": "2023-12-14T14:40:18.165238Z", "iopub.status.busy": "2023-12-14T14:40:18.164497Z", "iopub.status.idle": "2023-12-14T14:40:18.170461Z", "shell.execute_reply": "2023-12-14T14:40:18.169686Z" } }, "outputs": [], "source": [ "n = n_groups * group_size * level1_size * level2_size" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Construct the design matrix." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "execution": { "iopub.execute_input": "2023-12-14T14:40:18.173888Z", "iopub.status.busy": "2023-12-14T14:40:18.173398Z", "iopub.status.idle": "2023-12-14T14:40:18.232282Z", "shell.execute_reply": "2023-12-14T14:40:18.231180Z" } }, "outputs": [], "source": [ "xmat = np.random.normal(size=(n, p))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Construct labels showing which group each observation belongs to at each level." ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "execution": { "iopub.execute_input": "2023-12-14T14:40:18.236966Z", "iopub.status.busy": "2023-12-14T14:40:18.236122Z", "iopub.status.idle": "2023-12-14T14:40:18.257696Z", "shell.execute_reply": "2023-12-14T14:40:18.256750Z" } }, "outputs": [], "source": [ "groups_ix = np.kron(np.arange(n // group_size), np.ones(group_size)).astype(int)\n", "level1_ix = np.kron(np.arange(n // level1_size), np.ones(level1_size)).astype(int)\n", "level2_ix = np.kron(np.arange(n // level2_size), np.ones(level2_size)).astype(int)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Simulate the random effects." ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "execution": { "iopub.execute_input": "2023-12-14T14:40:18.261719Z", "iopub.status.busy": "2023-12-14T14:40:18.261287Z", "iopub.status.idle": "2023-12-14T14:40:18.273721Z", "shell.execute_reply": "2023-12-14T14:40:18.272952Z" } }, "outputs": [], "source": [ "groups_re = np.sqrt(groups_var) * np.random.normal(size=n // group_size)\n", "level1_re = np.sqrt(level1_var) * np.random.normal(size=n // level1_size)\n", "level2_re = np.sqrt(level2_var) * np.random.normal(size=n // level2_size)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Simulate the response variable." ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "execution": { "iopub.execute_input": "2023-12-14T14:40:18.277519Z", "iopub.status.busy": "2023-12-14T14:40:18.277049Z", "iopub.status.idle": "2023-12-14T14:40:18.289743Z", "shell.execute_reply": "2023-12-14T14:40:18.289097Z" } }, "outputs": [], "source": [ "y = groups_re[groups_ix] + level1_re[level1_ix] + level2_re[level2_ix]\n", "y += np.sqrt(resid_var) * np.random.normal(size=n)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Put everything into a dataframe." ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "execution": { "iopub.execute_input": "2023-12-14T14:40:18.293313Z", "iopub.status.busy": "2023-12-14T14:40:18.292860Z", "iopub.status.idle": "2023-12-14T14:40:18.307114Z", "shell.execute_reply": "2023-12-14T14:40:18.306423Z" } }, "outputs": [], "source": [ "df = pd.DataFrame(xmat, columns=[\"x%d\" % j for j in range(p)])\n", "df[\"y\"] = y + xmat[:, 0] - xmat[:, 3]\n", "df[\"groups_ix\"] = groups_ix\n", "df[\"level1_ix\"] = level1_ix\n", "df[\"level2_ix\"] = level2_ix" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Fit the model." ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "execution": { "iopub.execute_input": "2023-12-14T14:40:18.313165Z", "iopub.status.busy": "2023-12-14T14:40:18.311488Z", "iopub.status.idle": "2023-12-14T14:40:34.315020Z", "shell.execute_reply": "2023-12-14T14:40:34.313721Z" } }, "outputs": [], "source": [ "cs = sm.cov_struct.Nested()\n", "dep_fml = \"0 + level1_ix + level2_ix\"\n", "m = sm.GEE.from_formula(\n", " \"y ~ x0 + x1 + x2 + x3 + x4\",\n", " cov_struct=cs,\n", " dep_data=dep_fml,\n", " groups=\"groups_ix\",\n", " data=df,\n", ")\n", "r = m.fit()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The estimated covariance parameters should be similar to `groups_var`, `level1_var`, etc. as defined above." ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "execution": { "iopub.execute_input": "2023-12-14T14:40:34.319259Z", "iopub.status.busy": "2023-12-14T14:40:34.318555Z", "iopub.status.idle": "2023-12-14T14:40:34.351842Z", "shell.execute_reply": "2023-12-14T14:40:34.350766Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Variance
groups_ix0.985572
level1_ix2.085966
level2_ix3.042724
Residual3.994040
\n", "
" ], "text/plain": [ " Variance\n", "groups_ix 0.985572\n", "level1_ix 2.085966\n", "level2_ix 3.042724\n", "Residual 3.994040" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "r.cov_struct.summary()" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.13" } }, "nbformat": 4, "nbformat_minor": 4 }