Generalized Estimating Equations

Generalized Estimating Equations estimate generalized linear models for panel, cluster or repeated measures data when the observations are possibly correlated withing a cluster but uncorrelated across clusters. It supports estimation of the same one-parameter exponential families as Generalized Linear models (GLM).

See Module Reference for commands and arguments.


The following illustrates a Poisson regression with exchangeable correlation within clusters using data on epilepsy seizures.

In [1]: import statsmodels.api as sm

In [2]: import statsmodels.formula.api as smf

In [3]: data = sm.datasets.get_rdataset('epil', package='MASS').data

In [4]: fam = sm.families.Poisson()

In [5]: ind = sm.cov_struct.Exchangeable()

In [6]: mod = smf.gee("y ~ age + trt + base", "subject", data,
   ...:               cov_struct=ind, family=fam)

In [7]: res =

In [8]: print(res.summary())
                               GEE Regression Results                              
Dep. Variable:                           y   No. Observations:                  236
Model:                                 GEE   No. clusters:                       59
Method:                        Generalized   Min. cluster size:                   4
                      Estimating Equations   Max. cluster size:                   4
Family:                            Poisson   Mean cluster size:                 4.0
Dependence structure:         Exchangeable   Num. iterations:                     2
Date:                     Sun, 24 Nov 2019   Scale:                           1.000
Covariance type:                    robust   Time:                         07:51:44
                       coef    std err          z      P>|z|      [0.025      0.975]
Intercept            0.5730      0.361      1.589      0.112      -0.134       1.280
trt[T.progabide]    -0.1519      0.171     -0.888      0.375      -0.487       0.183
age                  0.0223      0.011      1.960      0.050    2.11e-06       0.045
base                 0.0226      0.001     18.451      0.000       0.020       0.025
Skew:                          3.7823   Kurtosis:                      28.6672
Centered skew:                 2.7597   Centered kurtosis:             21.9865

Several notebook examples of the use of GEE can be found on the Wiki: Wiki notebooks for GEE


  • KY Liang and S Zeger. “Longitudinal data analysis using generalized linear models”. Biometrika (1986) 73 (1): 13-22.

  • S Zeger and KY Liang. “Longitudinal Data Analysis for Discrete and Continuous Outcomes”. Biometrics Vol. 42, No. 1 (Mar., 1986), pp. 121-130

  • A Rotnitzky and NP Jewell (1990). “Hypothesis testing of regression parameters in semiparametric generalized linear models for cluster correlated data”, Biometrika, 77, 485-497.

  • Xu Guo and Wei Pan (2002). “Small sample performance of the score test in GEE”.

  • LA Mancl LA, TA DeRouen (2001). A covariance estimator for GEE with improved small-sample properties. Biometrics. 2001 Mar;57(1):126-34.

Module Reference

Model Class

GEE(endog, exog, groups[, time, family, …])

Estimation of marginal regression models using Generalized Estimating Equations (GEE).

QIF(endog, exog, groups[, family, …])

Fit a regression model using quadratic inference functions (QIF).

Results Classes

GEEResults(model, params, cov_params, scale)

This class summarizes the fit of a marginal regression model using GEE.

GEEMargins(results, args[, kwargs])

Estimated marginal effects for a regression model fit with GEE.

QIFResults(model, params, cov_params, scale)

Results class for QIF Regression

Dependence Structures

The dependence structures currently implemented are


Base class for correlation and covariance structures.


A first-order autoregressive working dependence structure.


An exchangeable working dependence structure.


Estimate the global odds ratio for a GEE with ordinal or nominal data.


An independence working dependence structure.


A nested working dependence structure.


The distribution families are the same as for GLM, currently implemented are

Family(link, variance)

The parent class for one-parameter exponential families.


Binomial exponential family distribution.


Gamma exponential family distribution.


Gaussian exponential family distribution.


InverseGaussian exponential family.

NegativeBinomial([link, alpha])

Negative Binomial exponential family.


Poisson exponential family.

Tweedie([link, var_power, eql])

Tweedie family.