.. currentmodule:: statsmodels.genmod.generalized_linear_model .. _glm: Generalized Linear Models ========================= Generalized linear models currently supports estimation using the one-parameter exponential families. See Module Reference_ for commands and arguments. Examples -------- .. ipython:: python :okwarning: # Load modules and data import statsmodels.api as sm data = sm.datasets.scotland.load() data.exog = sm.add_constant(data.exog) # Instantiate a gamma family model with the default link function. gamma_model = sm.GLM(data.endog, data.exog, family=sm.families.Gamma()) gamma_results = gamma_model.fit() print(gamma_results.summary()) Detailed examples can be found here: * GLM __ * Formula __ Technical Documentation ----------------------- .. ..glm_techn1 .. ..glm_techn2 The statistical model for each observation :math:i is assumed to be :math:Y_i \sim F_{EDM}(\cdot|\theta,\phi,w_i) and :math:\mu_i = E[Y_i|x_i] = g^{-1}(x_i^\prime\beta). where :math:g is the link function and :math:F_{EDM}(\cdot|\theta,\phi,w) is a distribution of the family of exponential dispersion models (EDM) with natural parameter :math:\theta, scale parameter :math:\phi and weight :math:w. Its density is given by :math:f_{EDM}(y|\theta,\phi,w) = c(y,\phi,w) \exp\left(\frac{y\theta-b(\theta)}{\phi}w\right)\,. It follows that :math:\mu = b'(\theta) and :math:Var[Y|x]=\frac{\phi}{w}b''(\theta). The inverse of the first equation gives the natural parameter as a function of the expected value :math:\theta(\mu) such that :math:Var[Y_i|x_i] = \frac{\phi}{w_i} v(\mu_i) with :math:v(\mu) = b''(\theta(\mu)). Therefore it is said that a GLM is determined by link function :math:g and variance function :math:v(\mu) alone (and :math:x of course). Note that while :math:\phi is the same for every observation :math:y_i and therefore does not influence the estimation of :math:\beta, the weights :math:w_i might be different for every :math:y_i such that the estimation of :math:\beta depends on them. ================================================= ============================== ============================== ======================================== =========================================== ============================================================================ ===================== Distribution Domain :math:\mu=E[Y|x] :math:v(\mu) :math:\theta(\mu) :math:b(\theta) :math:\phi ================================================= ============================== ============================== ======================================== =========================================== ============================================================================ ===================== Binomial :math:B(n,p) :math:0,1,\ldots,n :math:np :math:\mu-\frac{\mu^2}{n} :math:\log\frac{p}{1-p} :math:n\log(1+e^\theta) 1 Poisson :math:P(\mu) :math:0,1,\ldots,\infty :math:\mu :math:\mu :math:\log(\mu) :math:e^\theta 1 Neg. Binom. :math:NB(\mu,\alpha) :math:0,1,\ldots,\infty :math:\mu :math:\mu+\alpha\mu^2 :math:\log(\frac{\alpha\mu}{1+\alpha\mu}) :math:-\frac{1}{\alpha}\log(1-\alpha e^\theta) 1 Gaussian/Normal :math:N(\mu,\sigma^2) :math:(-\infty,\infty) :math:\mu :math:1 :math:\mu :math:\frac{1}{2}\theta^2 :math:\sigma^2 Gamma :math:N(\mu,\nu) :math:(0,\infty) :math:\mu :math:\mu^2 :math:-\frac{1}{\mu} :math:-\log(-\theta) :math:\frac{1}{\nu} Inv. Gauss. :math:IG(\mu,\sigma^2) :math:(0,\infty) :math:\mu :math:\mu^3 :math:-\frac{1}{2\mu^2} :math:-\sqrt{-2\theta} :math:\sigma^2 Tweedie :math:p\geq 1 depends on :math:p :math:\mu :math:\mu^p :math:\frac{\mu^{1-p}}{1-p} :math:\frac{\alpha-1}{\alpha}\left(\frac{\theta}{\alpha-1}\right)^{\alpha} :math:\phi ================================================= ============================== ============================== ======================================== =========================================== ============================================================================ ===================== The Tweedie distribution has special cases for :math:p=0,1,2 not listed in the table and uses :math:\alpha=\frac{p-2}{p-1}. Correspondence of mathematical variables to code: * :math:Y and :math:y are coded as endog, the variable one wants to model * :math:x is coded as exog, the covariates alias explanatory variables * :math:\beta is coded as params, the parameters one wants to estimate * :math:\mu is coded as mu, the expectation (conditional on :math:x) of :math:Y * :math:g is coded as link argument to the class Family * :math:\phi is coded as scale, the dispersion parameter of the EDM * :math:w is not yet supported (i.e. :math:w=1), in the future it might be var_weights * :math:p is coded as var_power for the power of the variance function :math:v(\mu) of the Tweedie distribution, see table * :math:\alpha is either * Negative Binomial: the ancillary parameter alpha, see table * Tweedie: an abbreviation for :math:\frac{p-2}{p-1} of the power :math:p of the variance function, see table References ^^^^^^^^^^ * Gill, Jeff. 2000. Generalized Linear Models: A Unified Approach. SAGE QASS Series. * Green, PJ. 1984. “Iteratively reweighted least squares for maximum likelihood estimation, and some robust and resistant alternatives.” Journal of the Royal Statistical Society, Series B, 46, 149-192. * Hardin, J.W. and Hilbe, J.M. 2007. “Generalized Linear Models and Extensions.” 2nd ed. Stata Press, College Station, TX. * McCullagh, P. and Nelder, J.A. 1989. “Generalized Linear Models.” 2nd ed. Chapman & Hall, Boca Rotan. Module Reference ---------------- .. module:: statsmodels.genmod.generalized_linear_model :synopsis: Generalized Linear Models (GLM) Model Class ^^^^^^^^^^^ .. autosummary:: :toctree: generated/ GLM Results Class ^^^^^^^^^^^^^ .. autosummary:: :toctree: generated/ GLMResults PredictionResults .. _families: Families ^^^^^^^^ The distribution families currently implemented are .. module:: statsmodels.genmod.families.family .. currentmodule:: statsmodels.genmod.families.family .. autosummary:: :toctree: generated/ Family Binomial Gamma Gaussian InverseGaussian NegativeBinomial Poisson Tweedie .. _links: Link Functions ^^^^^^^^^^^^^^ The link functions currently implemented are the following. Not all link functions are available for each distribution family. The list of available link functions can be obtained by :: >>> sm.families.family..links .. module:: statsmodels.genmod.families.links .. currentmodule:: statsmodels.genmod.families.links .. autosummary:: :toctree: generated/ Link CDFLink CLogLog LogLog Log Logit NegativeBinomial Power cauchy cloglog loglog identity inverse_power inverse_squared log logit nbinom probit .. _varfuncs: Variance Functions ^^^^^^^^^^^^^^^^^^ Each of the families has an associated variance function. You can access the variance functions here: :: >>> sm.families..variance .. module:: statsmodels.genmod.families.varfuncs .. currentmodule:: statsmodels.genmod.families.varfuncs .. autosummary:: :toctree: generated/ VarianceFunction constant Power mu mu_squared mu_cubed Binomial binary NegativeBinomial nbinom