Source code for statsmodels.regression.mixed_linear_model

"""
Linear mixed effects models for Statsmodels

The data are partitioned into disjoint groups.  The probability model
for group i is:

Y = X*beta + Z*gamma + epsilon

where

* n_i is the number of observations in group i
* Y is a n_i dimensional response vector
* X is a n_i x k_fe design matrix for the fixed effects
* beta is a k_fe-dimensional vector of fixed effects slopes
* Z is a n_i x k_re design matrix for the random effects
* gamma is a k_re-dimensional random vector with mean 0
  and covariance matrix Psi; note that each group
  gets its own independent realization of gamma.
* epsilon is a n_i dimensional vector of iid normal
  errors with mean 0 and variance sigma^2; the epsilon
  values are independent both within and between groups

Y, X and Z must be entirely observed.  beta, Psi, and sigma^2 are
estimated using ML or REML estimation, and gamma and epsilon are
random so define the probability model.

The mean structure is E[Y|X,Z] = X*beta.  If only the mean structure
is of interest, GEE is a good alternative to mixed models.

The primary reference for the implementation details is:

MJ Lindstrom, DM Bates (1988).  "Newton Raphson and EM algorithms for
linear mixed effects models for repeated measures data".  Journal of
the American Statistical Association. Volume 83, Issue 404, pages
1014-1022.

See also this more recent document:

http://econ.ucsb.edu/~doug/245a/Papers/Mixed%20Effects%20Implement.pdf

All the likelihood, gradient, and Hessian calculations closely follow
Lindstrom and Bates.

The following two documents are written more from the perspective of
users:

http://lme4.r-forge.r-project.org/lMMwR/lrgprt.pdf

http://lme4.r-forge.r-project.org/slides/2009-07-07-Rennes/3Longitudinal-4.pdf

Notation:

* `cov_re` is the random effects covariance matrix (referred to above
  as Psi) and `scale` is the (scalar) error variance.  For a single
  group, the marginal covariance matrix of endog given exog is scale*I
  + Z * cov_re * Z', where Z is the design matrix for the random
  effects in one group.

Notes:

1. Three different parameterizations are used here in different
places.  The regression slopes (usually called `fe_params`) are
identical in all three parameterizations, but the variance parameters
differ.  The parameterizations are:

* The "natural parameterization" in which cov(endog) = scale*I + Z *
  cov_re * Z', as described above.  This is the main parameterization
  visible to the user.

* The "profile parameterization" in which cov(endog) = I +
  Z * cov_re1 * Z'.  This is the parameterization of the profile
  likelihood that is maximized to produce parameter estimates.
  (see Lindstrom and Bates for details).  The "natural" cov_re is
  equal to the "profile" cov_re1 times scale.

* The "square root parameterization" in which we work with the
  Cholesky factor of cov_re1 instead of cov_re1 directly.

All three parameterizations can be "packed" by concatenating fe_params
together with the lower triangle of the dependence structure.  Note
that when unpacking, it is important to either square or reflect the
dependence structure depending on which parameterization is being
used.

2. The situation where the random effects covariance matrix is
singular is numerically challenging.  Small changes in the covariance
parameters may lead to large changes in the likelihood and
derivatives.

3. The optimization strategy is to first use OLS to get starting
values for the mean structure.  Then we optionally perform a few EM
steps, followed by optionally performing a few steepest ascent steps.
This is followed by conjugate gradient optimization using one of the
scipy gradient optimizers.  The EM and steepest ascent steps are used
to get adequate starting values for the conjugate gradient
optimization, which is much faster.
"""

import numpy as np
import statsmodels.base.model as base
from scipy.optimize import fmin_ncg, fmin_cg, fmin_bfgs, fmin
from statsmodels.tools.decorators import cache_readonly
from statsmodels.tools import data as data_tools
from scipy.stats.distributions import norm
import pandas as pd
import patsy
from statsmodels.compat.collections import OrderedDict
from statsmodels.compat import range
import warnings
from statsmodels.tools.sm_exceptions import ConvergenceWarning
from statsmodels.base._penalties import Penalty
from statsmodels.compat.numpy import np_matrix_rank

from pandas import DataFrame


def _get_exog_re_names(exog_re):
    """
    Passes through if given a list of names. Otherwise, gets pandas names
    or creates some generic variable names as needed.
    """
    if isinstance(exog_re, pd.DataFrame):
        return exog_re.columns.tolist()
    elif isinstance(exog_re, pd.Series) and exog_re.name is not None:
        return [exog_re.name]
    elif isinstance(exog_re, list):
        return exog_re
    return ["Z{0}".format(k + 1) for k in range(exog_re.shape[1])]


class MixedLMParams(object):
    """
    This class represents a parameter state for a mixed linear model.

    Parameters
    ----------
    k_fe : integer
        The number of covariates with fixed effects.
    k_re : integer
        The number of covariates with random effects.
    use_sqrt : boolean
        If True, the covariance matrix is stored using as the lower
        triangle of its Cholesky square root, otherwise it is stored
        as the lower triangle of the covariance matrix.

    Notes
    -----
    This object represents the parameter state for the model in which
    the scale parameter has been profiled out.
    """

    def __init__(self, k_fe, k_re, use_sqrt=True):

        self.k_fe = k_fe
        self.k_re = k_re
        self.k_re2 = k_re * (k_re + 1) / 2
        self.k_tot = self.k_fe + self.k_re2
        self.use_sqrt = use_sqrt
        self._ix = np.tril_indices(self.k_re)
        self._params = np.zeros(self.k_tot)

    def from_packed(params, k_fe, use_sqrt):
        """
        Create a MixedLMParams object from packed parameter vector.

        Parameters
        ----------
        params : array-like
            The mode parameters packed into a single vector.
        k_fe : integer
            The number of covariates with fixed effects
        use_sqrt : boolean
            If True, the random effects covariance matrix is stored as
            its Cholesky factor, otherwise the lower triangle of the
            covariance matrix is stored.

        Returns
        -------
        A MixedLMParams object.
        """

        k_re2 = len(params) - k_fe
        k_re = (-1 + np.sqrt(1 + 8*k_re2)) / 2
        if k_re != int(k_re):
            raise ValueError("Length of `packed` not compatible with value of `fe`.")
        k_re = int(k_re)

        pa = MixedLMParams(k_fe, k_re, use_sqrt)
        pa.set_packed(params)
        return pa

    from_packed = staticmethod(from_packed)

    def from_components(fe_params, cov_re=None, cov_re_sqrt=None,
                        use_sqrt=True):
        """
        Create a MixedLMParams object from each parameter component.

        Parameters
        ----------
        fe_params : array-like
            The fixed effects parameter (a 1-dimensional array).
        cov_re : array-like
            The random effects covariance matrix (a square, symmetric
            2-dimensional array).
        cov_re_sqrt : array-like
            The Cholesky (lower triangular) square root of the random
            effects covariance matrix.
        use_sqrt : boolean
            If True, the random effects covariance matrix is stored as
            the lower triangle of its Cholesky factor, otherwise the
            lower triangle of the covariance matrix is stored.

        Returns
        -------
        A MixedLMParams object.
        """

        k_fe = len(fe_params)
        k_re = cov_re.shape[0]
        pa = MixedLMParams(k_fe, k_re, use_sqrt)
        pa.set_fe_params(fe_params)
        pa.set_cov_re(cov_re)

        return pa

    from_components = staticmethod(from_components)

    def copy(self):
        """
        Returns a copy of the object.
        """

        obj = MixedLMParams(self.k_fe, self.k_re, self.use_sqrt)
        obj.set_packed(self.get_packed().copy())
        return obj

    def get_packed(self, use_sqrt=None):
        """
        Returns the model parameters packed into a single vector.

        Parameters
        ----------
        use_sqrt : None or bool
            If None, `use_sqrt` has the value of this instance's
            `use_sqrt`.  Otherwise it is set to the given value.
        """

        if (use_sqrt is None) or (use_sqrt == self.use_sqrt):
            return self._params

        pa = self._params.copy()
        cov_re = self.get_cov_re()

        if use_sqrt:
            L = np.linalg.cholesky(cov_re)
            pa[self.k_fe:] = L[self._ix]
        else:
            pa[self.k_fe:] = cov_re[self._ix]

        return pa

    def set_packed(self, params):
        """
        Sets the packed parameter vector to the given vector, without
        any validity checking.
        """
        self._params = params

    def get_fe_params(self):
        """
        Returns the fixed effects paramaters as a ndarray.
        """
        return self._params[0:self.k_fe]

    def set_fe_params(self, fe_params):
        """
        Set the fixed effect parameters to the given vector.
        """
        self._params[0:self.k_fe] = fe_params

    def set_cov_re(self, cov_re=None, cov_re_sqrt=None):
        """
        Set the random effects covariance matrix to the given value.

        Parameters
        ----------
        cov_re : array-like
            The random effects covariance matrix.
        cov_re_sqrt : array-like
            The Cholesky square root of the random effects covariance
            matrix.  Only the lower triangle is read.

        Notes
        -----
        The first of `cov_re` and `cov_re_sqrt` that is not None is
        used.
        """

        if cov_re is not None:
            if self.use_sqrt:
                cov_re_sqrt = np.linalg.cholesky(cov_re)
                self._params[self.k_fe:] = cov_re_sqrt[self._ix]
            else:
                self._params[self.k_fe:] = cov_re[self._ix]

        elif cov_re_sqrt is not None:
            if self.use_sqrt:
                self._params[self.k_fe:] = cov_re_sqrt[self._ix]
            else:
                cov_re = np.dot(cov_re_sqrt, cov_re_sqrt.T)
                self._params[self.k_fe:] = cov_re[self._ix]

    def get_cov_re(self):
        """
        Returns the random effects covariance matrix.
        """
        pa = self._params[self.k_fe:]

        cov_re = np.zeros((self.k_re, self.k_re))
        cov_re[self._ix] = pa
        if self.use_sqrt:
            cov_re = np.dot(cov_re, cov_re.T)
        else:
            cov_re = (cov_re + cov_re.T) - np.diag(np.diag(cov_re))

        return cov_re


# This is a global switch to use direct linear algebra calculations
# for solving factor-structured linear systems and calculating
# factor-structured determinants.  If False, use the
# Sherman-Morrison-Woodbury update which is more efficient for
# factor-structured matrices.  Should be False except when testing.
_no_smw = False

def _smw_solve(s, A, AtA, B, BI, rhs):
    """
    Solves the system (s*I + A*B*A') * x = rhs for x and returns x.

    Parameters
    ----------
    s : scalar
        See above for usage
    A : square symmetric ndarray
        See above for usage
    AtA : square ndarray
        A.T * A
    B : square symmetric ndarray
        See above for usage
    BI : square symmetric ndarray
        The inverse of `B`.  Can be None if B is singular
    rhs : ndarray
        See above for usage

    Returns
    -------
    x : ndarray
        See above

    If the global variable `_no_smw` is True, this routine uses direct
    linear algebra calculations.  Otherwise it uses the
    Sherman-Morrison-Woodbury identity to speed up the calculation.
    """

    # Direct calculation
    if _no_smw or BI is None:
        mat = np.dot(A, np.dot(B, A.T))
        # Add constant to diagonal
        mat.flat[::mat.shape[0]+1] += s
        return np.linalg.solve(mat, rhs)

    # Use SMW identity
    qmat = BI + AtA / s
    u = np.dot(A.T, rhs)
    qmat = np.linalg.solve(qmat, u)
    qmat = np.dot(A, qmat)
    rslt = rhs / s - qmat / s**2
    return rslt


def _smw_logdet(s, A, AtA, B, BI, B_logdet):
    """
    Use the matrix determinant lemma to accelerate the calculation of
    the log determinant of s*I + A*B*A'.

    Parameters
    ----------
    s : scalar
        See above for usage
    A : square symmetric ndarray
        See above for usage
    AtA : square matrix
        A.T * A
    B : square symmetric ndarray
        See above for usage
    BI : square symmetric ndarray
        The inverse of `B`; can be None if B is singular.
    B_logdet : real
        The log determinant of B

    Returns
    -------
    The log determinant of s*I + A*B*A'.
    """

    p = A.shape[0]

    if _no_smw or BI is None:
        mat = np.dot(A, np.dot(B, A.T))
        # Add constant to diagonal
        mat.flat[::p+1] += s
        _, ld = np.linalg.slogdet(mat)
        return ld

    ld = p * np.log(s)

    qmat = BI + AtA / s
    _, ld1 = np.linalg.slogdet(qmat)

    return B_logdet + ld + ld1


[docs]class MixedLM(base.LikelihoodModel): """ An object specifying a linear mixed effects model. Use the `fit` method to fit the model and obtain a results object. Parameters ---------- endog : 1d array-like The dependent variable exog : 2d array-like A matrix of covariates used to determine the mean structure (the "fixed effects" covariates). groups : 1d array-like A vector of labels determining the groups -- data from different groups are independent exog_re : 2d array-like A matrix of covariates used to determine the variance and covariance structure (the "random effects" covariates). If None, defaults to a random intercept for each group. use_sqrt : bool If True, optimization is carried out using the lower triangle of the square root of the random effects covariance matrix, otherwise it is carried out using the lower triangle of the random effects covariance matrix. missing : string The approach to missing data handling Notes ----- The covariates in `exog` and `exog_re` may (but need not) partially or wholly overlap. `use_sqrt` should almost always be set to True. The main use case for use_sqrt=False is when complicated patterns of fixed values in the covariance structure are set (using the `free` argument to `fit`) that cannot be expressed in terms of the Cholesky factor L. """ def __init__(self, endog, exog, groups, exog_re=None, use_sqrt=True, missing='none', **kwargs): self.use_sqrt = use_sqrt # Some defaults self.reml = True self.fe_pen = None self.re_pen = None # If there is one covariate, it may be passed in as a column # vector, convert these to 2d arrays. # TODO: Can this be moved up in the class hierarchy? # yes, it should be done up the hierarchy if (exog is not None and data_tools._is_using_ndarray_type(exog, None) and exog.ndim == 1): exog = exog[:, None] if (exog_re is not None and data_tools._is_using_ndarray_type(exog_re, None) and exog_re.ndim == 1): exog_re = exog_re[:, None] # Calling super creates self.endog, etc. as ndarrays and the # original exog, endog, etc. are self.data.endog, etc. super(MixedLM, self).__init__(endog, exog, groups=groups, exog_re=exog_re, missing=missing, **kwargs) self._init_keys.extend(["use_sqrt"]) self.k_fe = exog.shape[1] # Number of fixed effects parameters if exog_re is None: # Default random effects structure (random intercepts). self.k_re = 1 self.k_re2 = 1 self.exog_re = np.ones((len(endog), 1), dtype=np.float64) self.data.exog_re = self.exog_re self.data.param_names = self.exog_names + ['Intercept RE'] else: # Process exog_re the same way that exog is handled # upstream # TODO: this is wrong and should be handled upstream wholly self.data.exog_re = exog_re self.exog_re = np.asarray(exog_re) if self.exog_re.ndim == 1: self.exog_re = self.exog_re[:, None] if not self.data._param_names: # HACK: could've been set in from_formula already # needs refactor (param_names, exog_re_names, exog_re_names_full) = self._make_param_names(exog_re) self.data.param_names = param_names self.data.exog_re_names = exog_re_names self.data.exog_re_names_full = exog_re_names_full # Model dimensions # Number of random effect covariates self.k_re = self.exog_re.shape[1] # Number of covariance parameters self.k_re2 = self.k_re * (self.k_re + 1) // 2 self.k_params = self.k_fe + self.k_re2 # Convert the data to the internal representation, which is a # list of arrays, corresponding to the groups. group_labels = list(set(groups)) group_labels.sort() row_indices = dict((s, []) for s in group_labels) for i,g in enumerate(groups): row_indices[g].append(i) self.row_indices = row_indices self.group_labels = group_labels self.n_groups = len(self.group_labels) # Split the data by groups self.endog_li = self.group_list(self.endog) self.exog_li = self.group_list(self.exog) self.exog_re_li = self.group_list(self.exog_re) # Precompute this. self.exog_re2_li = [np.dot(x.T, x) for x in self.exog_re_li] # The total number of observations, summed over all groups self.n_totobs = sum([len(y) for y in self.endog_li]) # why do it like the above? self.nobs = len(self.endog) # Set the fixed effects parameter names if self.exog_names is None: self.exog_names = ["FE%d" % (k + 1) for k in range(self.exog.shape[1])] def _make_param_names(self, exog_re): """ Returns the full parameter names list, just the exogenous random effects variables, and the exogenous random effects variables with the interaction terms. """ exog_names = list(self.exog_names) exog_re_names = _get_exog_re_names(exog_re) param_names = [] jj = self.k_fe for i in range(len(exog_re_names)): for j in range(i + 1): if i == j: param_names.append(exog_re_names[i] + " RE") else: param_names.append(exog_re_names[j] + " RE x " + exog_re_names[i] + " RE") jj += 1 return exog_names + param_names, exog_re_names, param_names @classmethod
[docs] def from_formula(cls, formula, data, re_formula=None, subset=None, *args, **kwargs): """ Create a Model from a formula and dataframe. Parameters ---------- formula : str or generic Formula object The formula specifying the model data : array-like The data for the model. See Notes. re_formula : string A one-sided formula defining the variance structure of the model. The default gives a random intercept for each group. subset : array-like An array-like object of booleans, integers, or index values that indicate the subset of df to use in the model. Assumes df is a `pandas.DataFrame` args : extra arguments These are passed to the model kwargs : extra keyword arguments These are passed to the model with one exception. The ``eval_env`` keyword is passed to patsy. It can be either a :class:`patsy:patsy.EvalEnvironment` object or an integer indicating the depth of the namespace to use. For example, the default ``eval_env=0`` uses the calling namespace. If you wish to use a "clean" environment set ``eval_env=-1``. Returns ------- model : Model instance Notes ------ `data` must define __getitem__ with the keys in the formula terms args and kwargs are passed on to the model instantiation. E.g., a numpy structured or rec array, a dictionary, or a pandas DataFrame. If `re_formula` is not provided, the default is a random intercept for each group. This method currently does not correctly handle missing values, so missing values should be explicitly dropped from the DataFrame before calling this method. """ if "groups" not in kwargs.keys(): raise AttributeError("'groups' is a required keyword argument in MixedLM.from_formula") # If `groups` is a variable name, retrieve the data for the # groups variable. if type(kwargs["groups"]) == str: kwargs["groups"] = np.asarray(data[kwargs["groups"]]) if re_formula is not None: eval_env = kwargs.get('eval_env', None) if eval_env is None: eval_env = 1 elif eval_env == -1: from patsy import EvalEnvironment eval_env = EvalEnvironment({}) exog_re = patsy.dmatrix(re_formula, data, eval_env=eval_env) exog_re_names = exog_re.design_info.column_names exog_re = np.asarray(exog_re) else: exog_re = np.ones((data.shape[0], 1), dtype=np.float64) exog_re_names = ["Intercept"] mod = super(MixedLM, cls).from_formula(formula, data, subset=None, exog_re=exog_re, *args, **kwargs) # expand re names to account for pairs of RE (param_names, exog_re_names, exog_re_names_full) = mod._make_param_names(exog_re_names) mod.data.param_names = param_names mod.data.exog_re_names = exog_re_names mod.data.exog_re_names_full = exog_re_names_full return mod
[docs] def group_list(self, array): """ Returns `array` split into subarrays corresponding to the grouping structure. """ if array.ndim == 1: return [np.array(array[self.row_indices[k]]) for k in self.group_labels] else: return [np.array(array[self.row_indices[k], :]) for k in self.group_labels]
[docs] def fit_regularized(self, start_params=None, method='l1', alpha=0, ceps=1e-4, ptol=1e-6, maxit=200, **fit_kwargs): """ Fit a model in which the fixed effects parameters are penalized. The dependence parameters are held fixed at their estimated values in the unpenalized model. Parameters ---------- method : string of Penalty object Method for regularization. If a string, must be 'l1'. alpha : array-like Scalar or vector of penalty weights. If a scalar, the same weight is applied to all coefficients; if a vector, it contains a weight for each coefficient. If method is a Penalty object, the weights are scaled by alpha. For L1 regularization, the weights are used directly. ceps : positive real scalar Fixed effects parameters smaller than this value in magnitude are treaded as being zero. ptol : positive real scalar Convergence occurs when the sup norm difference between successive values of `fe_params` is less than `ptol`. maxit : integer The maximum number of iterations. fit_kwargs : keywords Additional keyword arguments passed to fit. Returns ------- A MixedLMResults instance containing the results. Notes ----- The covariance structure is not updated as the fixed effects parameters are varied. The algorithm used here for L1 regularization is a"shooting" or cyclic coordinate descent algorithm. If method is 'l1', then `fe_pen` and `cov_pen` are used to obtain the covariance structure, but are ignored during the L1-penalized fitting. References ---------- Friedman, J. H., Hastie, T. and Tibshirani, R. Regularized Paths for Generalized Linear Models via Coordinate Descent. Journal of Statistical Software, 33(1) (2008) http://www.jstatsoft.org/v33/i01/paper http://statweb.stanford.edu/~tibs/stat315a/Supplements/fuse.pdf """ if type(method) == str and (method.lower() != 'l1'): raise ValueError("Invalid regularization method") # If method is a smooth penalty just optimize directly. if isinstance(method, Penalty): # Scale the penalty weights by alpha method.alpha = alpha fit_kwargs.update({"fe_pen": method}) return self.fit(**fit_kwargs) if np.isscalar(alpha): alpha = alpha * np.ones(self.k_fe, dtype=np.float64) # Fit the unpenalized model to get the dependence structure. mdf = self.fit(**fit_kwargs) fe_params = mdf.fe_params cov_re = mdf.cov_re scale = mdf.scale try: cov_re_inv = np.linalg.inv(cov_re) except np.linalg.LinAlgError: cov_re_inv = None for itr in range(maxit): fe_params_s = fe_params.copy() for j in range(self.k_fe): if abs(fe_params[j]) < ceps: continue # The residuals fe_params[j] = 0. expval = np.dot(self.exog, fe_params) resid_all = self.endog - expval # The loss function has the form # a*x^2 + b*x + pwt*|x| a, b = 0., 0. for k, lab in enumerate(self.group_labels): exog = self.exog_li[k] ex_r = self.exog_re_li[k] ex2_r = self.exog_re2_li[k] resid = resid_all[self.row_indices[lab]] x = exog[:,j] u = _smw_solve(scale, ex_r, ex2_r, cov_re, cov_re_inv, x) a += np.dot(u, x) b -= 2 * np.dot(u, resid) pwt1 = alpha[j] if b > pwt1: fe_params[j] = -(b - pwt1) / (2 * a) elif b < -pwt1: fe_params[j] = -(b + pwt1) / (2 * a) if np.abs(fe_params_s - fe_params).max() < ptol: break # Replace the fixed effects estimates with their penalized # values, leave the dependence parameters in their unpenalized # state. params_prof = mdf.params.copy() params_prof[0:self.k_fe] = fe_params scale = self.get_scale(fe_params, mdf.cov_re_unscaled) # Get the Hessian including only the nonzero fixed effects, # then blow back up to the full size after inverting. hess = self.hessian_full(params_prof) pcov = np.nan * np.ones_like(hess) ii = np.abs(params_prof) > ceps ii[self.k_fe:] = True ii = np.flatnonzero(ii) hess1 = hess[ii, :][:, ii] pcov[np.ix_(ii,ii)] = np.linalg.inv(-hess1) results = MixedLMResults(self, params_prof, pcov / scale) results.fe_params = fe_params results.cov_re = cov_re results.scale = scale results.cov_re_unscaled = mdf.cov_re_unscaled results.method = mdf.method results.converged = True results.cov_pen = self.cov_pen results.k_fe = self.k_fe results.k_re = self.k_re results.k_re2 = self.k_re2 return MixedLMResultsWrapper(results)
def _reparam(self): """ Returns parameters of the map converting parameters from the form used in optimization to the form returned to the user. Returns ------- lin : list-like Linear terms of the map quad : list-like Quadratic terms of the map Notes ----- If P are the standard form parameters and R are the modified parameters (i.e. with square root covariance), then P[i] = lin[i] * R + R' * quad[i] * R """ k_fe, k_re, k_re2 = self.k_fe, self.k_re, self.k_re2 k_tot = k_fe + k_re2 ix = np.tril_indices(self.k_re) lin = [] for k in range(k_fe): e = np.zeros(k_tot) e[k] = 1 lin.append(e) for k in range(k_re2): lin.append(np.zeros(k_tot)) quad = [] for k in range(k_tot): quad.append(np.zeros((k_tot, k_tot))) ii = np.tril_indices(k_re) ix = [(a,b) for a,b in zip(ii[0], ii[1])] for i1 in range(k_re2): for i2 in range(k_re2): ix1 = ix[i1] ix2 = ix[i2] if (ix1[1] == ix2[1]) and (ix1[0] <= ix2[0]): ii = (ix2[0], ix1[0]) k = ix.index(ii) quad[k_fe+k][k_fe+i2, k_fe+i1] += 1 for k in range(k_tot): quad[k] = 0.5*(quad[k] + quad[k].T) return lin, quad
[docs] def hessian_sqrt(self, params): """ Returns the Hessian matrix of the log-likelihood evaluated at a given point, calculated with respect to the parameterization in which the random effects covariance matrix is represented through its Cholesky square root. Parameters ---------- params : MixedLMParams or array-like The model parameters. If array-like, must contain packed parameters that are compatible with this model. Returns ------- The Hessian matrix of the profile log likelihood function, evaluated at `params`. Notes ----- If `params` is provided as a MixedLMParams object it may be of any parameterization. """ if type(params) is not MixedLMParams: params = MixedLMParams.from_packed(params, self.k_fe, self.use_sqrt) score0 = self.score_full(params) hess0 = self.hessian_full(params) params_vec = params.get_packed(use_sqrt=True) lin, quad = self._reparam() k_tot = self.k_fe + self.k_re2 # Convert Hessian to new coordinates hess = 0. for i in range(k_tot): hess += 2 * score0[i] * quad[i] for i in range(k_tot): vi = lin[i] + 2*np.dot(quad[i], params_vec) for j in range(k_tot): vj = lin[j] + 2*np.dot(quad[j], params_vec) hess += hess0[i, j] * np.outer(vi, vj) return hess
[docs] def loglike(self, params): """ Evaluate the (profile) log-likelihood of the linear mixed effects model. Parameters ---------- params : MixedLMParams, or array-like. The parameter value. If array-like, must be a packed parameter vector compatible with this model. Returns ------- The log-likelihood value at `params`. Notes ----- This is the profile likelihood in which the scale parameter `scale` has been profiled out. The input parameter state, if provided as a MixedLMParams object, can be with respect to any parameterization. """ if type(params) is not MixedLMParams: params = MixedLMParams.from_packed(params, self.k_fe, self.use_sqrt) fe_params = params.get_fe_params() cov_re = params.get_cov_re() try: cov_re_inv = np.linalg.inv(cov_re) except np.linalg.LinAlgError: cov_re_inv = None _, cov_re_logdet = np.linalg.slogdet(cov_re) # The residuals expval = np.dot(self.exog, fe_params) resid_all = self.endog - expval likeval = 0. # Handle the covariance penalty if self.cov_pen is not None: likeval -= self.cov_pen.func(cov_re, cov_re_inv) # Handle the fixed effects penalty if self.fe_pen is not None: likeval -= self.fe_pen.func(fe_params) xvx, qf = 0., 0. for k, lab in enumerate(self.group_labels): exog = self.exog_li[k] ex_r = self.exog_re_li[k] ex2_r = self.exog_re2_li[k] resid = resid_all[self.row_indices[lab]] # Part 1 of the log likelihood (for both ML and REML) ld = _smw_logdet(1., ex_r, ex2_r, cov_re, cov_re_inv, cov_re_logdet) likeval -= ld / 2. # Part 2 of the log likelihood (for both ML and REML) u = _smw_solve(1., ex_r, ex2_r, cov_re, cov_re_inv, resid) qf += np.dot(resid, u) # Adjustment for REML if self.reml: mat = _smw_solve(1., ex_r, ex2_r, cov_re, cov_re_inv, exog) xvx += np.dot(exog.T, mat) if self.reml: likeval -= (self.n_totobs - self.k_fe) * np.log(qf) / 2. _,ld = np.linalg.slogdet(xvx) likeval -= ld / 2. likeval -= (self.n_totobs - self.k_fe) * np.log(2 * np.pi) / 2. likeval += ((self.n_totobs - self.k_fe) * np.log(self.n_totobs - self.k_fe) / 2.) likeval -= (self.n_totobs - self.k_fe) / 2. else: likeval -= self.n_totobs * np.log(qf) / 2. likeval -= self.n_totobs * np.log(2 * np.pi) / 2. likeval += self.n_totobs * np.log(self.n_totobs) / 2. likeval -= self.n_totobs / 2. return likeval
def _gen_dV_dPsi(self, ex_r, max_ix=None): """ A generator that yields the derivative of the covariance matrix V (=I + Z*Psi*Z') with respect to the free elements of Psi. Each call to the generator yields the index of Psi with respect to which the derivative is taken, and the derivative matrix with respect to that element of Psi. Psi is a symmetric matrix, so the free elements are the lower triangle. If max_ix is not None, the iterations terminate after max_ix values are yielded. """ jj = 0 for j1 in range(self.k_re): for j2 in range(j1 + 1): if max_ix is not None and jj > max_ix: return mat = np.outer(ex_r[:,j1], ex_r[:,j2]) if j1 != j2: mat += mat.T yield jj,mat jj += 1
[docs] def score(self, params): """ Returns the score vector of the profile log-likelihood. Notes ----- The score vector that is returned is computed with respect to the parameterization defined by this model instance's `use_sqrt` attribute. The input value `params` can be with respect to any parameterization. """ if self.use_sqrt: return self.score_sqrt(params) else: return self.score_full(params)
[docs] def hessian(self, params): """ Returns the Hessian matrix of the profile log-likelihood. Notes ----- The Hessian matrix that is returned is computed with respect to the parameterization defined by this model's `use_sqrt` attribute. The input value `params` can be with respect to any parameterization. """ if self.use_sqrt: return self.hessian_sqrt(params) else: return self.hessian_full(params)
[docs] def score_full(self, params): """ Calculates the score vector for the profiled log-likelihood of the mixed effects model with respect to the parameterization in which the random effects covariance matrix is represented in its full form (not using the Cholesky factor). Parameters ---------- params : MixedLMParams or array-like The parameter at which the score function is evaluated. If array-like, must contain packed parameter values that are compatible with the current model. Returns ------- The score vector, calculated at `params`. Notes ----- The score vector that is returned is taken with respect to the parameterization in which `cov_re` is represented through its lower triangle (without taking the Cholesky square root). The input, if provided as a MixedLMParams object, can be of any parameterization. """ if type(params) is not MixedLMParams: params = MixedLMParams.from_packed(params, self.k_fe, self.use_sqrt) fe_params = params.get_fe_params() cov_re = params.get_cov_re() try: cov_re_inv = np.linalg.inv(cov_re) except np.linalg.LinAlgError: cov_re_inv = None score_fe = np.zeros(self.k_fe, dtype=np.float64) score_re = np.zeros(self.k_re2, dtype=np.float64) # Handle the covariance penalty. if self.cov_pen is not None: score_re -= self.cov_pen.grad(cov_re, cov_re_inv) # Handle the fixed effects penalty. if self.fe_pen is not None: score_fe -= self.fe_pen.grad(fe_params) # resid' V^{-1} resid, summed over the groups (a scalar) rvir = 0. # exog' V^{-1} resid, summed over the groups (a k_fe # dimensional vector) xtvir = 0. # exog' V^{_1} exog, summed over the groups (a k_fe x k_fe # matrix) xtvix = 0. # V^{-1} exog' dV/dQ_jj exog V^{-1}, where Q_jj is the jj^th # covariance parameter. xtax = [0.,] * self.k_re2 # Temporary related to the gradient of log |V| dlv = np.zeros(self.k_re2, dtype=np.float64) # resid' V^{-1} dV/dQ_jj V^{-1} resid (a scalar) rvavr = np.zeros(self.k_re2, dtype=np.float64) for k in range(self.n_groups): exog = self.exog_li[k] ex_r = self.exog_re_li[k] ex2_r = self.exog_re2_li[k] # The residuals expval = np.dot(exog, fe_params) resid = self.endog_li[k] - expval if self.reml: viexog = _smw_solve(1., ex_r, ex2_r, cov_re, cov_re_inv, exog) xtvix += np.dot(exog.T, viexog) # Contributions to the covariance parameter gradient jj = 0 vex = _smw_solve(1., ex_r, ex2_r, cov_re, cov_re_inv, ex_r) vir = _smw_solve(1., ex_r, ex2_r, cov_re, cov_re_inv, resid) for jj,mat in self._gen_dV_dPsi(ex_r): dlv[jj] = np.trace(_smw_solve(1., ex_r, ex2_r, cov_re, cov_re_inv, mat)) rvavr[jj] += np.dot(vir, np.dot(mat, vir)) if self.reml: xtax[jj] += np.dot(viexog.T, np.dot(mat, viexog)) # Contribution of log|V| to the covariance parameter # gradient. score_re -= 0.5 * dlv # Needed for the fixed effects params gradient rvir += np.dot(resid, vir) xtvir += np.dot(exog.T, vir) fac = self.n_totobs if self.reml: fac -= self.exog.shape[1] score_fe += fac * xtvir / rvir score_re += 0.5 * fac * rvavr / rvir if self.reml: for j in range(self.k_re2): score_re[j] += 0.5 * np.trace(np.linalg.solve( xtvix, xtax[j])) score_vec = np.concatenate((score_fe, score_re)) if self._freepat is not None: return self._freepat.get_packed() * score_vec else: return score_vec
[docs] def score_sqrt(self, params): """ Returns the score vector with respect to the parameterization in which the random effects covariance matrix is represented through its Cholesky square root. Parameters ---------- params : MixedLMParams or array-like The model parameters. If array-like must contain packed parameters that are compatible with this model instance. Returns ------- The score vector. Notes ----- The input, if provided as a MixedLMParams object, can be of any parameterization. """ if type(params) is not MixedLMParams: params = MixedLMParams.from_packed(params, self.k_fe, self.use_sqrt) score_full = self.score_full(params) params_vec = params.get_packed(use_sqrt=True) lin, quad = self._reparam() scr = 0. for i in range(len(params_vec)): v = lin[i] + 2 * np.dot(quad[i], params_vec) scr += score_full[i] * v if self._freepat is not None: return self._freepat.get_packed() * scr else: return scr
[docs] def hessian_full(self, params): """ Calculates the Hessian matrix for the mixed effects model with respect to the parameterization in which the covariance matrix is represented directly (without square-root transformation). Parameters ---------- params : MixedLMParams or array-like The model parameters at which the Hessian is calculated. If array-like, must contain the packed parameters in a form that is compatible with this model instance. Returns ------- hess : 2d ndarray The Hessian matrix, evaluated at `params`. Notes ----- Tf provided as a MixedLMParams object, the input may be of any parameterization. """ if type(params) is not MixedLMParams: params = MixedLMParams.from_packed(params, self.k_fe, self.use_sqrt) fe_params = params.get_fe_params() cov_re = params.get_cov_re() try: cov_re_inv = np.linalg.inv(cov_re) except np.linalg.LinAlgError: cov_re_inv = None # Blocks for the fixed and random effects parameters. hess_fe = 0. hess_re = np.zeros((self.k_re2, self.k_re2), dtype=np.float64) hess_fere = np.zeros((self.k_re2, self.k_fe), dtype=np.float64) fac = self.n_totobs if self.reml: fac -= self.exog.shape[1] rvir = 0. xtvix = 0. xtax = [0.,] * self.k_re2 B = np.zeros(self.k_re2, dtype=np.float64) D = np.zeros((self.k_re2, self.k_re2), dtype=np.float64) F = [[0.,]*self.k_re2 for k in range(self.k_re2)] for k in range(self.n_groups): exog = self.exog_li[k] ex_r = self.exog_re_li[k] ex2_r = self.exog_re2_li[k] # The residuals expval = np.dot(exog, fe_params) resid = self.endog_li[k] - expval viexog = _smw_solve(1., ex_r, ex2_r, cov_re, cov_re_inv, exog) xtvix += np.dot(exog.T, viexog) vir = _smw_solve(1., ex_r, ex2_r, cov_re, cov_re_inv, resid) rvir += np.dot(resid, vir) for jj1,mat1 in self._gen_dV_dPsi(ex_r): hess_fere[jj1,:] += np.dot(viexog.T, np.dot(mat1, vir)) if self.reml: xtax[jj1] += np.dot(viexog.T, np.dot(mat1, viexog)) B[jj1] += np.dot(vir, np.dot(mat1, vir)) E = _smw_solve(1., ex_r, ex2_r, cov_re, cov_re_inv, mat1) for jj2,mat2 in self._gen_dV_dPsi(ex_r, jj1): Q = np.dot(mat2, E) Q1 = Q + Q.T vt = np.dot(vir, np.dot(Q1, vir)) D[jj1, jj2] += vt if jj1 != jj2: D[jj2, jj1] += vt R = _smw_solve(1., ex_r, ex2_r, cov_re, cov_re_inv, Q) rt = np.trace(R) / 2 hess_re[jj1, jj2] += rt if jj1 != jj2: hess_re[jj2, jj1] += rt if self.reml: F[jj1][jj2] += np.dot(viexog.T, np.dot(Q, viexog)) hess_fe -= fac * xtvix / rvir hess_re = hess_re - 0.5 * fac * (D/rvir - np.outer(B, B) / rvir**2) hess_fere = -fac * hess_fere / rvir if self.reml: for j1 in range(self.k_re2): Q1 = np.linalg.solve(xtvix, xtax[j1]) for j2 in range(j1 + 1): Q2 = np.linalg.solve(xtvix, xtax[j2]) a = np.trace(np.dot(Q1, Q2)) a -= np.trace(np.linalg.solve(xtvix, F[j1][j2])) a *= 0.5 hess_re[j1, j2] += a if j1 > j2: hess_re[j2, j1] += a # Put the blocks together to get the Hessian. m = self.k_fe + self.k_re2 hess = np.zeros((m, m), dtype=np.float64) hess[0:self.k_fe, 0:self.k_fe] = hess_fe hess[0:self.k_fe, self.k_fe:] = hess_fere.T hess[self.k_fe:, 0:self.k_fe] = hess_fere hess[self.k_fe:, self.k_fe:] = hess_re return hess
[docs] def steepest_ascent(self, params, n_iter): """ Take steepest ascent steps to increase the log-likelihood function. Parameters ---------- params : array-like The starting point of the optimization. n_iter: non-negative integer Return once this number of iterations have occured. Returns ------- A MixedLMParameters object containing the final value of the optimization. """ fval = self.loglike(params) params = params.copy() params1 = params.copy() for itr in range(n_iter): gro = self.score(params) gr = gro / np.max(np.abs(gro)) sl = 0.5 while sl > 1e-20: params1._params = params._params + sl*gr fval1 = self.loglike(params1) if fval1 > fval: cov_re = params1.get_cov_re() if np.min(np.diag(cov_re)) > 1e-2: params, params1 = params1, params fval = fval1 break sl /= 2 return params
[docs] def Estep(self, fe_params, cov_re, scale): """ The E-step of the EM algorithm. This is for ML (not REML), but it seems to be good enough to use for REML starting values. Parameters ---------- fe_params : 1d ndarray The current value of the fixed effect coefficients cov_re : 2d ndarray The current value of the covariance matrix of random effects scale : positive scalar The current value of the error variance Returns ------- m1x : 1d ndarray sum_groups :math:`X'*Z*E[gamma | Y]`, where X and Z are the fixed and random effects covariates, gamma is the random effects, and Y is the observed data m1y : scalar sum_groups :math:`Y'*E[gamma | Y]` m2 : 2d ndarray sum_groups :math:`E[gamma * gamma' | Y]` m2xx : 2d ndarray sum_groups :math:`Z'*Z * E[gamma * gamma' | Y]` """ m1x, m1y, m2, m2xx = 0., 0., 0., 0. try: cov_re_inv = np.linalg.inv(cov_re) except np.linalg.LinAlgError: cov_re_inv = None for k in range(self.n_groups): # Get the residuals expval = np.dot(self.exog_li[k], fe_params) resid = self.endog_li[k] - expval # Contruct the marginal covariance matrix for this group ex_r = self.exog_re_li[k] ex2_r = self.exog_re2_li[k] vr1 = _smw_solve(scale, ex_r, ex2_r, cov_re, cov_re_inv, resid) vr1 = np.dot(ex_r.T, vr1) vr1 = np.dot(cov_re, vr1) vr2 = _smw_solve(scale, ex_r, ex2_r, cov_re, cov_re_inv, self.exog_re_li[k]) vr2 = np.dot(vr2, cov_re) vr2 = np.dot(ex_r.T, vr2) vr2 = np.dot(cov_re, vr2) rg = np.dot(ex_r, vr1) m1x += np.dot(self.exog_li[k].T, rg) m1y += np.dot(self.endog_li[k].T, rg) egg = cov_re - vr2 + np.outer(vr1, vr1) m2 += egg m2xx += np.dot(np.dot(ex_r.T, ex_r), egg) return m1x, m1y, m2, m2xx
[docs] def EM(self, fe_params, cov_re, scale, niter_em=10, hist=None): """ Run the EM algorithm from a given starting point. This is for ML (not REML), but it seems to be good enough to use for REML starting values. Returns ------- fe_params : 1d ndarray The final value of the fixed effects coefficients cov_re : 2d ndarray The final value of the random effects covariance matrix scale : float The final value of the error variance Notes ----- This uses the parameterization of the likelihood :math:`scale*I + Z'*V*Z`, note that this differs from the profile likelihood used in the gradient calculations. """ xxtot = 0. for x in self.exog_li: xxtot += np.dot(x.T, x) xytot = 0. for x,y in zip(self.exog_li, self.endog_li): xytot += np.dot(x.T, y) pp = [] for itr in range(niter_em): m1x, m1y, m2, m2xx = self.Estep(fe_params, cov_re, scale) fe_params = np.linalg.solve(xxtot, xytot - m1x) cov_re = m2 / self.n_groups scale = 0. for x,y in zip(self.exog_li, self.endog_li): scale += np.sum((y - np.dot(x, fe_params))**2) scale -= 2 * m1y scale += 2 * np.dot(fe_params, m1x) scale += np.trace(m2xx) scale /= self.n_totobs if hist is not None: hist.append(["EM", fe_params, cov_re, scale]) return fe_params, cov_re, scale
[docs] def get_scale(self, fe_params, cov_re): """ Returns the estimated error variance based on given estimates of the slopes and random effects covariance matrix. Parameters ---------- fe_params : array-like The regression slope estimates cov_re : 2d array Estimate of the random effects covariance matrix (Psi). Returns ------- scale : float The estimated error variance. """ try: cov_re_inv = np.linalg.inv(cov_re) except np.linalg.LinAlgError: cov_re_inv = None qf = 0. for k in range(self.n_groups): exog = self.exog_li[k] ex_r = self.exog_re_li[k] ex2_r = self.exog_re2_li[k] # The residuals expval = np.dot(exog, fe_params) resid = self.endog_li[k] - expval mat = _smw_solve(1., ex_r, ex2_r, cov_re, cov_re_inv, resid) qf += np.dot(resid, mat) if self.reml: qf /= (self.n_totobs - self.k_fe) else: qf /= self.n_totobs return qf
[docs] def starting_values(self, start_params): # It's already in the desired form. if type(start_params) is MixedLMParams: return start_params # Given a packed parameter vector, assume it has the same # dimensions as the current model. if type(start_params) in [np.ndarray, pd.Series]: return MixedLMParams.from_packed(start_params, self.k_fe, self.use_sqrt) # Very crude starting values. from statsmodels.regression.linear_model import OLS fe_params = OLS(self.endog, self.exog).fit().params cov_re = np.eye(self.k_re) pa = MixedLMParams.from_components(fe_params, cov_re=cov_re, use_sqrt=self.use_sqrt) return pa
[docs] def fit(self, start_params=None, reml=True, niter_em=0, niter_sa=0, do_cg=True, fe_pen=None, cov_pen=None, free=None, full_output=False, **kwargs): """ Fit a linear mixed model to the data. Parameters ---------- start_params: array-like or MixedLMParams If a `MixedLMParams` the state provides the starting value. If array-like, this is the packed parameter vector, assumed to be in the same state as this model. reml : bool If true, fit according to the REML likelihood, else fit the standard likelihood using ML. niter_sa : integer The number of steepest ascent iterations niter_em : non-negative integer The number of EM steps. The EM steps always preceed steepest descent and conjugate gradient optimization. The EM algorithm implemented here is for ML estimation. do_cg : bool If True, a conjugate gradient algorithm is used for optimization (following any steepest descent or EM steps). cov_pen : CovariancePenalty object A penalty for the random effects covariance matrix fe_pen : Penalty object A penalty on the fixed effects free : MixedLMParams object If not `None`, this is a mask that allows parameters to be held fixed at specified values. A 1 indicates that the correspondinig parameter is estimated, a 0 indicates that it is fixed at its starting value. Setting the `cov_re` component to the identity matrix fits a model with independent random effects. The state of `use_sqrt` for `free` must agree with that of the parent model. full_output : bool If true, attach iteration history to results Returns ------- A MixedLMResults instance. Notes ----- If `start` is provided as an array, it must have the same `use_sqrt` state as the parent model. The value of `free` must have the same `use_sqrt` state as the parent model. """ self.reml = reml self.cov_pen = cov_pen self.fe_pen = fe_pen self._freepat = free if full_output: hist = [] else: hist = None params = self.starting_values(start_params) success = False # EM iterations if niter_em > 0: fe_params = params.get_fe_params() cov_re = params.get_cov_re() scale = 1. fe_params, cov_re, scale = self.EM(fe_params, cov_re, scale, niter_em, hist) params.set_fe_params(fe_params) # Switch to profile parameterization. params.set_cov_re(cov_re / scale) # Try up to 10 times to make the optimization work. Usually # only one cycle is used. if do_cg: for cycle in range(10): params = self.steepest_ascent(params, niter_sa) try: kwargs["retall"] = hist is not None if "disp" not in kwargs: kwargs["disp"] = False # Only bfgs and lbfgs seem to work kwargs["method"] = "bfgs" pa = params.get_packed() rslt = super(MixedLM, self).fit(start_params=pa, skip_hessian=True, **kwargs) except np.linalg.LinAlgError: continue # The optimization succeeded params.set_packed(rslt.params) success = True if hist is not None: hist.append(rslt.mle_retvals) break if not success: msg = "Gradient optimization failed." warnings.warn(msg, ConvergenceWarning) # Convert to the final parameterization (i.e. undo the square # root transform of the covariance matrix, and the profiling # over the error variance). fe_params = params.get_fe_params() cov_re_unscaled = params.get_cov_re() scale = self.get_scale(fe_params, cov_re_unscaled) cov_re = scale * cov_re_unscaled if np.min(np.abs(np.diag(cov_re))) < 0.01: msg = "The MLE may be on the boundary of the parameter space." warnings.warn(msg, ConvergenceWarning) # Compute the Hessian at the MLE. Note that this is the # Hessian with respect to the random effects covariance matrix # (not its square root). It is used for obtaining standard # errors, not for optimization. hess = self.hessian_full(params) if free is not None: pcov = np.zeros_like(hess) pat = self._freepat.get_packed() ii = np.flatnonzero(pat) if len(ii) > 0: hess1 = hess[np.ix_(ii, ii)] pcov[np.ix_(ii, ii)] = np.linalg.inv(-hess1) else: pcov = np.linalg.inv(-hess) # Prepare a results class instance params_packed = params.get_packed() results = MixedLMResults(self, params_packed, pcov / scale) results.params_object = params results.fe_params = fe_params results.cov_re = cov_re results.scale = scale results.cov_re_unscaled = cov_re_unscaled results.method = "REML" if self.reml else "ML" results.converged = success results.hist = hist results.reml = self.reml results.cov_pen = self.cov_pen results.k_fe = self.k_fe results.k_re = self.k_re results.k_re2 = self.k_re2 results.use_sqrt = self.use_sqrt results.freepat = self._freepat return MixedLMResultsWrapper(results)
[docs]class MixedLMResults(base.LikelihoodModelResults): ''' Class to contain results of fitting a linear mixed effects model. MixedLMResults inherits from statsmodels.LikelihoodModelResults Parameters ---------- See statsmodels.LikelihoodModelResults Returns ------- **Attributes** model : class instance Pointer to PHreg model instance that called fit. normalized_cov_params : array The sampling covariance matrix of the estimates fe_params : array The fitted fixed-effects coefficients re_params : array The fitted random-effects covariance matrix bse_fe : array The standard errors of the fitted fixed effects coefficients bse_re : array The standard errors of the fitted random effects covariance matrix See Also -------- statsmodels.LikelihoodModelResults ''' def __init__(self, model, params, cov_params): super(MixedLMResults, self).__init__(model, params, normalized_cov_params=cov_params) self.nobs = self.model.nobs self.df_resid = self.nobs - np_matrix_rank(self.model.exog) @cache_readonly
[docs] def bse_fe(self): """ Returns the standard errors of the fixed effect regression coefficients. """ p = self.model.exog.shape[1] return np.sqrt(np.diag(self.cov_params())[0:p])
@cache_readonly
[docs] def bse_re(self): """ Returns the standard errors of the variance parameters. Note that the sampling distribution of variance parameters is strongly skewed unless the sample size is large, so these standard errors may not give meaningful confidence intervals of p-values if used in the usual way. """ p = self.model.exog.shape[1] return np.sqrt(self.scale * np.diag(self.cov_params())[p:])
@cache_readonly
[docs] def random_effects(self): """ Returns the conditional means of all random effects given the data. Returns ------- random_effects : DataFrame A DataFrame with the distinct `group` values as the index and the conditional means of the random effects in the columns. """ try: cov_re_inv = np.linalg.inv(self.cov_re) except np.linalg.LinAlgError: cov_re_inv = None ranef_dict = {} for k in range(self.model.n_groups): endog = self.model.endog_li[k] exog = self.model.exog_li[k] ex_r = self.model.exog_re_li[k] ex2_r = self.model.exog_re2_li[k] label = self.model.group_labels[k] # Get the residuals expval = np.dot(exog, self.fe_params) resid = endog - expval vresid = _smw_solve(self.scale, ex_r, ex2_r, self.cov_re, cov_re_inv, resid) ranef_dict[label] = np.dot(self.cov_re, np.dot(ex_r.T, vresid)) column_names = dict(zip(range(self.k_re), self.model.data.exog_re_names)) df = DataFrame.from_dict(ranef_dict, orient='index') return df.rename(columns=column_names).ix[self.model.group_labels]
@cache_readonly
[docs] def random_effects_cov(self): """ Returns the conditional covariance matrix of the random effects for each group given the data. Returns ------- random_effects_cov : dict A dictionary mapping the distinct values of the `group` variable to the conditional covariance matrix of the random effects given the data. """ try: cov_re_inv = np.linalg.inv(self.cov_re) except np.linalg.LinAlgError: cov_re_inv = None ranef_dict = {} #columns = self.model.data.exog_re_names for k in range(self.model.n_groups): ex_r = self.model.exog_re_li[k] ex2_r = self.model.exog_re2_li[k] label = self.model.group_labels[k] mat1 = np.dot(ex_r, self.cov_re) mat2 = _smw_solve(self.scale, ex_r, ex2_r, self.cov_re, cov_re_inv, mat1) mat2 = np.dot(mat1.T, mat2) ranef_dict[label] = self.cov_re - mat2 #ranef_dict[label] = DataFrame(self.cov_re - mat2, # index=columns, columns=columns) return ranef_dict
[docs] def summary(self, yname=None, xname_fe=None, xname_re=None, title=None, alpha=.05): """ Summarize the mixed model regression results. Parameters ----------- yname : string, optional Default is `y` xname_fe : list of strings, optional Fixed effects covariate names xname_re : list of strings, optional Random effects covariate names title : string, optional Title for the top table. If not None, then this replaces the default title alpha : float significance level for the confidence intervals Returns ------- smry : Summary instance this holds the summary tables and text, which can be printed or converted to various output formats. See Also -------- statsmodels.iolib.summary.Summary : class to hold summary results """ from statsmodels.iolib import summary2 smry = summary2.Summary() info = OrderedDict() info["Model:"] = "MixedLM" if yname is None: yname = self.model.endog_names info["No. Observations:"] = str(self.model.n_totobs) info["No. Groups:"] = str(self.model.n_groups) gs = np.array([len(x) for x in self.model.endog_li]) info["Min. group size:"] = "%.0f" % min(gs) info["Max. group size:"] = "%.0f" % max(gs) info["Mean group size:"] = "%.1f" % np.mean(gs) info["Dependent Variable:"] = yname info["Method:"] = self.method info["Scale:"] = self.scale info["Likelihood:"] = self.llf info["Converged:"] = "Yes" if self.converged else "No" smry.add_dict(info) smry.add_title("Mixed Linear Model Regression Results") float_fmt = "%.3f" sdf = np.nan * np.ones((self.k_fe + self.k_re2, 6), dtype=np.float64) # Coefficient estimates sdf[0:self.k_fe, 0] = self.fe_params # Standard errors sdf[0:self.k_fe, 1] =\ np.sqrt(np.diag(self.cov_params()[0:self.k_fe])) # Z-scores sdf[0:self.k_fe, 2] = sdf[0:self.k_fe, 0] / sdf[0:self.k_fe, 1] # p-values sdf[0:self.k_fe, 3] = 2 * norm.cdf(-np.abs(sdf[0:self.k_fe, 2])) # Confidence intervals qm = -norm.ppf(alpha / 2) sdf[0:self.k_fe, 4] = sdf[0:self.k_fe, 0] - qm * sdf[0:self.k_fe, 1] sdf[0:self.k_fe, 5] = sdf[0:self.k_fe, 0] + qm * sdf[0:self.k_fe, 1] # Names for all pairs of random effects jj = self.k_fe for i in range(self.k_re): for j in range(i + 1): sdf[jj, 0] = self.cov_re[i, j] sdf[jj, 1] = np.sqrt(self.scale) * self.bse[jj] jj += 1 sdf = pd.DataFrame(index=self.model.data.param_names, data=sdf) sdf.columns = ['Coef.', 'Std.Err.', 'z', 'P>|z|', '[' + str(alpha/2), str(1-alpha/2) + ']'] for col in sdf.columns: sdf[col] = [float_fmt % x if np.isfinite(x) else "" for x in sdf[col]] smry.add_df(sdf, align='r') return smry
[docs] def profile_re(self, re_ix, num_low=5, dist_low=1., num_high=5, dist_high=1.): """ Calculate a series of values along a 1-dimensional profile likelihood. Parameters ---------- re_ix : integer The index of the variance parameter for which to construct a profile likelihood. num_low : integer The number of points at which to calculate the likelihood below the MLE of the parameter of interest. dist_low : float The distance below the MLE of the parameter of interest to begin calculating points on the profile likelihood. num_high : integer The number of points at which to calculate the likelihood abov the MLE of the parameter of interest. dist_high : float The distance above the MLE of the parameter of interest to begin calculating points on the profile likelihood. Returns ------- An array with two columns. The first column contains the values to which the parameter of interest is constrained. The second column contains the corresponding likelihood values. Notes ----- Only variance parameters can be profiled. `re_ix` is the index of the random effect that is profiled. """ pmodel = self.model k_fe = pmodel.exog.shape[1] k_re = pmodel.exog_re.shape[1] endog, exog, groups = pmodel.endog, pmodel.exog, pmodel.groups # Need to permute the columns of the random effects design # matrix so that the profiled variable is in the first column. ix = np.arange(k_re) ix[0] = re_ix ix[re_ix] = 0 exog_re = pmodel.exog_re.copy()[:, ix] # Permute the covariance structure to match the permuted # design matrix. params = self.params_object.copy() cov_re_unscaled = params.get_cov_re() cov_re_unscaled = cov_re_unscaled[np.ix_(ix, ix)] params.set_cov_re(cov_re_unscaled) # Convert dist_low and dist_high to the profile # parameterization cov_re = self.scale * cov_re_unscaled low = (cov_re[0, 0] - dist_low) / self.scale high = (cov_re[0, 0] + dist_high) / self.scale # Define the sequence of values to which the parameter of # interest will be constrained. ru0 = cov_re_unscaled[0, 0] if low <= 0: raise ValueError("dist_low is too large and would result in a " "negative variance. Try a smaller value.") left = np.linspace(low, ru0, num_low + 1) right = np.linspace(ru0, high, num_high+1)[1:] rvalues = np.concatenate((left, right)) # Indicators of which parameters are free and fixed. free = MixedLMParams(k_fe, k_re, self.use_sqrt) if self.freepat is None: free.set_fe_params(np.ones(k_fe)) mat = np.ones((k_re, k_re)) else: free.set_fe_params(self.freepat.get_fe_params()) mat = self.freepat.get_cov_re() mat = mat[np.ix_(ix, ix)] mat[0, 0] = 0 if self.use_sqrt: free.set_cov_re(cov_re_sqrt=mat) else: free.set_cov_re(cov_re=mat) klass = self.model.__class__ init_kwargs = pmodel._get_init_kwds() init_kwargs['exog_re'] = exog_re likev = [] for x in rvalues: model = klass(endog, exog, **init_kwargs) cov_re = params.get_cov_re() cov_re[0, 0] = x # Shrink the covariance parameters until a PSD covariance # matrix is obtained. dg = np.diag(cov_re) success = False for ks in range(50): try: np.linalg.cholesky(cov_re) success = True break except np.linalg.LinAlgError: cov_re /= 2 np.fill_diagonal(cov_re, dg) if not success: raise ValueError("unable to find PSD covariance matrix along likelihood profile") params.set_cov_re(cov_re) # TODO should use fit_kwargs rslt = model.fit(start_params=params, free=free, reml=self.reml, cov_pen=self.cov_pen)._results likev.append([rslt.cov_re[0, 0], rslt.llf]) likev = np.asarray(likev) return likev
class MixedLMResultsWrapper(base.LikelihoodResultsWrapper): _attrs = {'bse_re': ('generic_columns', 'exog_re_names_full'), 'fe_params': ('generic_columns', 'xnames'), 'bse_fe': ('generic_columns', 'xnames'), 'cov_re': ('generic_columns_2d', 'exog_re_names'), 'cov_re_unscaled': ('generic_columns_2d', 'exog_re_names'), } _upstream_attrs = base.LikelihoodResultsWrapper._wrap_attrs _wrap_attrs = base.wrap.union_dicts(_attrs, _upstream_attrs) _methods = {} _upstream_methods = base.LikelihoodResultsWrapper._wrap_methods _wrap_methods = base.wrap.union_dicts(_methods, _upstream_methods)