# Generalized Estimating Equations¶

Generalized Estimating Equations estimate generalized linear models for panel, cluster or repeated measures data when the observations are possibly correlated withing a cluster but uncorrelated across clusters. It supports estimation of the same one-parameter exponential families as Generalized Linear models (GLM).

See Module Reference for commands and arguments.

## Examples¶

The following illustrates a Poisson regression with exchangeable correlation within clusters using data on epilepsy seizures.

In [1]: import statsmodels.api as sm

In [2]: import statsmodels.formula.api as smf

In [3]: data = sm.datasets.get_rdataset('epil', package='MASS').data

In [4]: fam = sm.families.Poisson()

In [5]: ind = sm.cov_struct.Exchangeable()

In [6]: mod = smf.gee("y ~ age + trt + base", "subject", data,
...:               cov_struct=ind, family=fam)
...:

In [7]: res = mod.fit()

In [8]: print(res.summary())
GEE Regression Results
===================================================================================
Dep. Variable:                           y   No. Observations:                  236
Model:                                 GEE   No. clusters:                       59
Method:                        Generalized   Min. cluster size:                   4
Estimating Equations   Max. cluster size:                   4
Family:                            Poisson   Mean cluster size:                 4.0
Dependence structure:         Exchangeable   Num. iterations:                     2
Date:                     Sun, 24 Nov 2019   Scale:                           1.000
Covariance type:                    robust   Time:                         07:51:44
====================================================================================
coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------------
Intercept            0.5730      0.361      1.589      0.112      -0.134       1.280
trt[T.progabide]    -0.1519      0.171     -0.888      0.375      -0.487       0.183
age                  0.0223      0.011      1.960      0.050    2.11e-06       0.045
base                 0.0226      0.001     18.451      0.000       0.020       0.025
==============================================================================
Skew:                          3.7823   Kurtosis:                      28.6672
Centered skew:                 2.7597   Centered kurtosis:             21.9865
==============================================================================

Several notebook examples of the use of GEE can be found on the Wiki: Wiki notebooks for GEE

### References¶

• KY Liang and S Zeger. “Longitudinal data analysis using generalized linear models”. Biometrika (1986) 73 (1): 13-22.

• S Zeger and KY Liang. “Longitudinal Data Analysis for Discrete and Continuous Outcomes”. Biometrics Vol. 42, No. 1 (Mar., 1986), pp. 121-130

• A Rotnitzky and NP Jewell (1990). “Hypothesis testing of regression parameters in semiparametric generalized linear models for cluster correlated data”, Biometrika, 77, 485-497.

• Xu Guo and Wei Pan (2002). “Small sample performance of the score test in GEE”. http://www.sph.umn.edu/faculty1/wp-content/uploads/2012/11/rr2002-013.pdf

• LA Mancl LA, TA DeRouen (2001). A covariance estimator for GEE with improved small-sample properties. Biometrics. 2001 Mar;57(1):126-34.

## Module Reference¶

### Model Class¶

 GEE(endog, exog, groups[, time, family, …]) Estimation of marginal regression models using Generalized Estimating Equations (GEE).
 QIF(endog, exog, groups[, family, …]) Fit a regression model using quadratic inference functions (QIF).

### Results Classes¶

 GEEResults(model, params, cov_params, scale) This class summarizes the fit of a marginal regression model using GEE. GEEMargins(results, args[, kwargs]) Estimated marginal effects for a regression model fit with GEE.
 QIFResults(model, params, cov_params, scale) Results class for QIF Regression

### Dependence Structures¶

The dependence structures currently implemented are

 CovStruct([cov_nearest_method]) Base class for correlation and covariance structures. Autoregressive([dist_func]) A first-order autoregressive working dependence structure. An exchangeable working dependence structure. GlobalOddsRatio(endog_type) Estimate the global odds ratio for a GEE with ordinal or nominal data. Independence([cov_nearest_method]) An independence working dependence structure. Nested([cov_nearest_method]) A nested working dependence structure.

### Families¶

The distribution families are the same as for GLM, currently implemented are

 Family(link, variance) The parent class for one-parameter exponential families. Binomial([link]) Binomial exponential family distribution. Gamma([link]) Gamma exponential family distribution. Gaussian([link]) Gaussian exponential family distribution. InverseGaussian([link]) InverseGaussian exponential family. NegativeBinomial([link, alpha]) Negative Binomial exponential family. Poisson([link]) Poisson exponential family. Tweedie([link, var_power, eql]) Tweedie family.