Generalized Estimating Equations

Generalized Estimating Equations estimate generalized linear models for panel, cluster or repeated measures data when the observations are possibly correlated withing a cluster but uncorrelated across clusters. It supports estimation of the same one-parameter exponential families as Generalized Linear models (GLM).

See Module Reference for commands and arguments.

Examples

The following illustrates a Poisson regression with exchangeable correlation within clusters using data on epilepsy seizures.

In [1]: import statsmodels.api as sm

In [2]: import statsmodels.formula.api as smf

In [3]: data = sm.datasets.get_rdataset('epil', package='MASS').data

In [4]: fam = sm.families.Poisson()

In [5]: ind = sm.cov_struct.Exchangeable()

In [6]: mod = smf.gee("y ~ age + trt + base", "subject", data,
   ...:               cov_struct=ind, family=fam)
   ...: 

In [7]: res = mod.fit()

In [8]: print(res.summary())
                               GEE Regression Results                              
===================================================================================
Dep. Variable:                           y   No. Observations:                  236
Model:                                 GEE   No. clusters:                       59
Method:                        Generalized   Min. cluster size:                   4
                      Estimating Equations   Max. cluster size:                   4
Family:                            Poisson   Mean cluster size:                 4.0
Dependence structure:         Exchangeable   Num. iterations:                     2
Date:                     Sun, 24 Nov 2019   Scale:                           1.000
Covariance type:                    robust   Time:                         07:51:44
====================================================================================
                       coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------------
Intercept            0.5730      0.361      1.589      0.112      -0.134       1.280
trt[T.progabide]    -0.1519      0.171     -0.888      0.375      -0.487       0.183
age                  0.0223      0.011      1.960      0.050    2.11e-06       0.045
base                 0.0226      0.001     18.451      0.000       0.020       0.025
==============================================================================
Skew:                          3.7823   Kurtosis:                      28.6672
Centered skew:                 2.7597   Centered kurtosis:             21.9865
==============================================================================

Several notebook examples of the use of GEE can be found on the Wiki: Wiki notebooks for GEE

References

  • KY Liang and S Zeger. “Longitudinal data analysis using generalized linear models”. Biometrika (1986) 73 (1): 13-22.

  • S Zeger and KY Liang. “Longitudinal Data Analysis for Discrete and Continuous Outcomes”. Biometrics Vol. 42, No. 1 (Mar., 1986), pp. 121-130

  • A Rotnitzky and NP Jewell (1990). “Hypothesis testing of regression parameters in semiparametric generalized linear models for cluster correlated data”, Biometrika, 77, 485-497.

  • Xu Guo and Wei Pan (2002). “Small sample performance of the score test in GEE”. http://www.sph.umn.edu/faculty1/wp-content/uploads/2012/11/rr2002-013.pdf

  • LA Mancl LA, TA DeRouen (2001). A covariance estimator for GEE with improved small-sample properties. Biometrics. 2001 Mar;57(1):126-34.

Module Reference

Model Class

GEE(endog, exog, groups[, time, family, …])

Estimation of marginal regression models using Generalized Estimating Equations (GEE).

QIF(endog, exog, groups[, family, …])

Fit a regression model using quadratic inference functions (QIF).

Results Classes

GEEResults(model, params, cov_params, scale)

This class summarizes the fit of a marginal regression model using GEE.

GEEMargins(results, args[, kwargs])

Estimated marginal effects for a regression model fit with GEE.

QIFResults(model, params, cov_params, scale)

Results class for QIF Regression

Dependence Structures

The dependence structures currently implemented are

CovStruct([cov_nearest_method])

Base class for correlation and covariance structures.

Autoregressive([dist_func])

A first-order autoregressive working dependence structure.

Exchangeable()

An exchangeable working dependence structure.

GlobalOddsRatio(endog_type)

Estimate the global odds ratio for a GEE with ordinal or nominal data.

Independence([cov_nearest_method])

An independence working dependence structure.

Nested([cov_nearest_method])

A nested working dependence structure.

Families

The distribution families are the same as for GLM, currently implemented are

Family(link, variance)

The parent class for one-parameter exponential families.

Binomial([link])

Binomial exponential family distribution.

Gamma([link])

Gamma exponential family distribution.

Gaussian([link])

Gaussian exponential family distribution.

InverseGaussian([link])

InverseGaussian exponential family.

NegativeBinomial([link, alpha])

Negative Binomial exponential family.

Poisson([link])

Poisson exponential family.

Tweedie([link, var_power, eql])

Tweedie family.