statsmodels.miscmodels.ordinal_model.OrderedModel#

class statsmodels.miscmodels.ordinal_model.OrderedModel(endog, exog, offset=None, distr='probit', **kwds)[source]#

Ordinal Model based on logistic or normal distribution

The parameterization corresponds to the proportional odds model in the logistic case. The model assumes that the endogenous variable is ordered but that the labels have no numeric interpretation besides the ordering.

The model is based on a latent linear variable, where we observe only a discretization.

y_latent = X beta + u

The observed variable is defined by the interval

y = {0 if y_latent <= cut_0: 1 of cut_0 < y_latent <= cut_1 … K if cut_K < y_latent

The probability of observing y=k conditional on the explanatory variables X is given by

prob(y = k | x) = Prob(cut_k < y_latent <= cut_k+1): = Prob(cut_k - x beta < u <= cut_k+1 - x beta = F(cut_k+1 - x beta) - F(cut_k - x beta)

Where F is the cumulative distribution of u which is either the normal or the logistic distribution, but can be set to any other continuous distribution. We use standardized distributions to avoid identifiability problems.

Parameters:

endogarray_like: Endogenous or dependent ordered categorical variable with k levels. Labels or values of endog will internally transformed to consecutive integers, 0, 1, 2, … pd.Series with ordered Categorical as dtype should be preferred as it gives the order relation between the levels. If endog is not a pandas Categorical, then categories are sorted in lexicographic order (by numpy.unique).
exogarray_like: Exogenous, explanatory variables. This should not include an intercept. pd.DataFrame are also accepted. see Notes about constant when using formulas
offsetarray_like, optional: Offset is added to the linear prediction with coefficient equal to 1.
distrstr ‘probit’ or ‘logit’, or a distribution instance: The default is currently ‘probit’ which uses the normal distribution and corresponds to an ordered Probit model. The distribution is assumed to have the main methods of scipy.stats distributions, mainly cdf, pdf and ppf. The inverse cdf, ppf, is only use to calculate starting values.
**kwds: Extra keyword arguments passed to the model, for example missing.

Attributes:

endog_names: Names of endogenous variables
exog_names: Names of exogenous variables
start_params: Start parameters for the optimization corresponding to null model

Methods

`cdf`(x)	Cdf evaluated at x
`expandparams`(params)	Expand to full parameter array when some parameters are fixed
`fit`([start_params, method, maxiter, ...])	Fit method for likelihood based models
`from_formula`(formula, data[, subset, drop_cols])	Create a Model from a formula and dataframe
`hessian`(params)	Hessian of log-likelihood evaluated at params
`hessian_factor`(params[, scale, observed])	Weights for calculating Hessian
`information`(params)	Fisher information matrix of model
`initialize`()	Initialize (possibly re-initialize) a Model instance.
`loglike`(params)	Log-likelihood of model at params
`loglikeobs`(params)	Log-likelihood of OrderedModel for all observations
`nloglike`(params)	Negative log-likelihood of model at params
`pdf`(x)	Pdf evaluated at x
`predict`(params[, exog, offset, which])	Predicted probabilities for each level of the ordinal endog
`prob`(low, upp)	Interval probability
`reduceparams`(params)	Reduce parameters
`score`(params)	Gradient of log-likelihood evaluated at params
`score_obs`(params, **kwds)	Jacobian/Gradient of log-likelihood evaluated at params for each observation
`score_obs_`(params)	Score, first derivative of loglike for each observation
`transform_reverse_threshold_params`(params)	Obtain transformed thresholds from original thresholds or cutoffs
`transform_threshold_params`(params)	Transformation of the parameters in the optimization

Notes

Status: experimental, core results are verified, still subclasses GenericLikelihoodModel which will change in future versions.

The parameterization of OrderedModel requires that there is no constant in the model, neither explicit nor implicit. The constant is equivalent to shifting all thresholds and is therefore not separately identified.

Patsy’s formula specification does not allow a design matrix without explicit or implicit constant if there are categorical variables (or maybe splines) among explanatory variables. As workaround, statsmodels removes an explicit intercept.

Consequently, there are two valid cases to get a design matrix without intercept when using formulas:

specify a model without explicit and implicit intercept which is possible if there are only numerical variables in the model.
specify a model with an explicit intercept which statsmodels will remove.

Models with an implicit intercept will be overparameterized, the parameter estimates will not be fully identified, cov_params will not be invertible and standard errors might contain nans. The computed results will be dominated by numerical imprecision coming mainly from convergence tolerance and numerical derivatives.

The model will raise a ValueError if a remaining constant is detected.

Methods

`cdf`(x)	Cdf evaluated at x
`expandparams`(params)	Expand to full parameter array when some parameters are fixed
`fit`([start_params, method, maxiter, ...])	Fit method for likelihood based models
`from_formula`(formula, data[, subset, drop_cols])	Create a Model from a formula and dataframe
`hessian`(params)	Hessian of log-likelihood evaluated at params
`hessian_factor`(params[, scale, observed])	Weights for calculating Hessian
`information`(params)	Fisher information matrix of model
`initialize`()	Initialize (possibly re-initialize) a Model instance.
`loglike`(params)	Log-likelihood of model at params
`loglikeobs`(params)	Log-likelihood of OrderedModel for all observations
`nloglike`(params)	Negative log-likelihood of model at params
`pdf`(x)	Pdf evaluated at x
`predict`(params[, exog, offset, which])	Predicted probabilities for each level of the ordinal endog
`prob`(low, upp)	Interval probability
`reduceparams`(params)	Reduce parameters
`score`(params)	Gradient of log-likelihood evaluated at params
`score_obs`(params, **kwds)	Jacobian/Gradient of log-likelihood evaluated at params for each observation
`score_obs_`(params)	Score, first derivative of loglike for each observation
`transform_reverse_threshold_params`(params)	Obtain transformed thresholds from original thresholds or cutoffs
`transform_threshold_params`(params)	Transformation of the parameters in the optimization

Properties

`endog_names`	Names of endogenous variables
`exog_names`	Names of exogenous variables
`start_params`	Start parameters for the optimization corresponding to null model