# Generalized Linear Models¶

Generalized linear models currently supports estimation using the one-parameter exponential families.

See Module Reference for commands and arguments.

## Examples¶

# Load modules and data
In [1]: import statsmodels.api as sm

# Instantiate a gamma family model with the default link function.
In [4]: gamma_model = sm.GLM(data.endog, data.exog, family=sm.families.Gamma())

In [5]: gamma_results = gamma_model.fit()

In [6]: print(gamma_results.summary())
Generalized Linear Model Regression Results
==============================================================================
Dep. Variable:                    YES   No. Observations:                   32
Model:                            GLM   Df Residuals:                       24
Model Family:                   Gamma   Df Model:                            7
Method:                          IRLS   Log-Likelihood:                -83.017
Date:                Tue, 08 Feb 2022   Deviance:                     0.087389
Time:                        18:23:00   Pearson chi2:                   0.0860
No. Iterations:                     6   Pseudo R-squ. (CS):             0.9800
Covariance Type:            nonrobust
======================================================================================
coef    std err          z      P>|z|      [0.025      0.975]
--------------------------------------------------------------------------------------
const                 -0.0178      0.011     -1.548      0.122      -0.040       0.005
COUTAX              4.962e-05   1.62e-05      3.060      0.002    1.78e-05    8.14e-05
UNEMPF                 0.0020      0.001      3.824      0.000       0.001       0.003
MOR                -7.181e-05   2.71e-05     -2.648      0.008      -0.000   -1.87e-05
ACT                    0.0001   4.06e-05      2.757      0.006    3.23e-05       0.000
GDP                -1.468e-07   1.24e-07     -1.187      0.235   -3.89e-07    9.56e-08
AGE                   -0.0005      0.000     -2.159      0.031      -0.001   -4.78e-05
COUTAX_FEMALEUNEMP -2.427e-06   7.46e-07     -3.253      0.001   -3.89e-06   -9.65e-07
======================================================================================


Detailed examples can be found here:

## Technical Documentation¶

The statistical model for each observation $$i$$ is assumed to be

$$Y_i \sim F_{EDM}(\cdot|\theta,\phi,w_i)$$ and $$\mu_i = E[Y_i|x_i] = g^{-1}(x_i^\prime\beta)$$.

where $$g$$ is the link function and $$F_{EDM}(\cdot|\theta,\phi,w)$$ is a distribution of the family of exponential dispersion models (EDM) with natural parameter $$\theta$$, scale parameter $$\phi$$ and weight $$w$$. Its density is given by

$$f_{EDM}(y|\theta,\phi,w) = c(y,\phi,w) \exp\left(\frac{y\theta-b(\theta)}{\phi}w\right)\,.$$

It follows that $$\mu = b'(\theta)$$ and $$Var[Y|x]=\frac{\phi}{w}b''(\theta)$$. The inverse of the first equation gives the natural parameter as a function of the expected value $$\theta(\mu)$$ such that

$$Var[Y_i|x_i] = \frac{\phi}{w_i} v(\mu_i)$$

with $$v(\mu) = b''(\theta(\mu))$$. Therefore it is said that a GLM is determined by link function $$g$$ and variance function $$v(\mu)$$ alone (and $$x$$ of course).

Note that while $$\phi$$ is the same for every observation $$y_i$$ and therefore does not influence the estimation of $$\beta$$, the weights $$w_i$$ might be different for every $$y_i$$ such that the estimation of $$\beta$$ depends on them.

Distribution

Domain

$$\mu=E[Y|x]$$

$$v(\mu)$$

$$\theta(\mu)$$

$$b(\theta)$$

$$\phi$$

Binomial $$B(n,p)$$

$$0,1,\ldots,n$$

$$np$$

$$\mu-\frac{\mu^2}{n}$$

$$\log\frac{p}{1-p}$$

$$n\log(1+e^\theta)$$

1

Poisson $$P(\mu)$$

$$0,1,\ldots,\infty$$

$$\mu$$

$$\mu$$

$$\log(\mu)$$

$$e^\theta$$

1

Neg. Binom. $$NB(\mu,\alpha)$$

$$0,1,\ldots,\infty$$

$$\mu$$

$$\mu+\alpha\mu^2$$

$$\log(\frac{\alpha\mu}{1+\alpha\mu})$$

$$-\frac{1}{\alpha}\log(1-\alpha e^\theta)$$

1

Gaussian/Normal $$N(\mu,\sigma^2)$$

$$(-\infty,\infty)$$

$$\mu$$

$$1$$

$$\mu$$

$$\frac{1}{2}\theta^2$$

$$\sigma^2$$

Gamma $$N(\mu,\nu)$$

$$(0,\infty)$$

$$\mu$$

$$\mu^2$$

$$-\frac{1}{\mu}$$

$$-\log(-\theta)$$

$$\frac{1}{\nu}$$

Inv. Gauss. $$IG(\mu,\sigma^2)$$

$$(0,\infty)$$

$$\mu$$

$$\mu^3$$

$$-\frac{1}{2\mu^2}$$

$$-\sqrt{-2\theta}$$

$$\sigma^2$$

Tweedie $$p\geq 1$$

depends on $$p$$

$$\mu$$

$$\mu^p$$

$$\frac{\mu^{1-p}}{1-p}$$

$$\frac{\alpha-1}{\alpha}\left(\frac{\theta}{\alpha-1}\right)^{\alpha}$$

$$\phi$$

The Tweedie distribution has special cases for $$p=0,1,2$$ not listed in the table and uses $$\alpha=\frac{p-2}{p-1}$$.

Correspondence of mathematical variables to code:

• $$Y$$ and $$y$$ are coded as endog, the variable one wants to model

• $$x$$ is coded as exog, the covariates alias explanatory variables

• $$\beta$$ is coded as params, the parameters one wants to estimate

• $$\mu$$ is coded as mu, the expectation (conditional on $$x$$) of $$Y$$

• $$g$$ is coded as link argument to the class Family

• $$\phi$$ is coded as scale, the dispersion parameter of the EDM

• $$w$$ is not yet supported (i.e. $$w=1$$), in the future it might be var_weights

• $$p$$ is coded as var_power for the power of the variance function $$v(\mu)$$ of the Tweedie distribution, see table

• $$\alpha$$ is either

• Negative Binomial: the ancillary parameter alpha, see table

• Tweedie: an abbreviation for $$\frac{p-2}{p-1}$$ of the power $$p$$ of the variance function, see table

### References¶

• Gill, Jeff. 2000. Generalized Linear Models: A Unified Approach. SAGE QASS Series.

• Green, PJ. 1984. “Iteratively reweighted least squares for maximum likelihood estimation, and some robust and resistant alternatives.” Journal of the Royal Statistical Society, Series B, 46, 149-192.

• Hardin, J.W. and Hilbe, J.M. 2007. “Generalized Linear Models and Extensions.” 2nd ed. Stata Press, College Station, TX.

• McCullagh, P. and Nelder, J.A. 1989. “Generalized Linear Models.” 2nd ed. Chapman & Hall, Boca Rotan.

## Module Reference¶

### Model Class¶

 GLM(endog, exog[, family, offset, exposure, ...]) Generalized Linear Models

### Results Class¶

 GLMResults(model, params, ...[, cov_type, ...]) Class to contain GLM results. PredictionResults(predicted_mean, var_pred_mean) Attributes

### Families¶

The distribution families currently implemented are

 Family(link, variance) The parent class for one-parameter exponential families. Binomial([link]) Binomial exponential family distribution. Gamma([link]) Gamma exponential family distribution. Gaussian([link]) Gaussian exponential family distribution. InverseGaussian([link]) InverseGaussian exponential family. NegativeBinomial([link, alpha]) Negative Binomial exponential family (corresponds to NB2). Poisson([link]) Poisson exponential family. Tweedie([link, var_power, eql]) Tweedie family.

### Variance Functions¶

Each of the families has an associated variance function. You can access the variance functions here:

>>> sm.families.<familyname>.variance

 Relates the variance of a random variable to its mean. constant The call method of constant returns a constant variance, i.e., a vector of ones. Power([power]) Power variance function mu Returns np.fabs(mu) mu_squared Returns np.fabs(mu)**2 mu_cubed Returns np.fabs(mu)**3 Binomial([n]) Binomial variance function binary The binomial variance function for n = 1 NegativeBinomial([alpha]) Negative binomial variance function nbinom Negative Binomial variance function.