statsmodels.miscmodels.ordinal_model.OrderedModel

class statsmodels.miscmodels.ordinal_model.OrderedModel(endog, exog, offset=None, distr='probit', **kwds)[source]

Ordinal Model based on logistic or normal distribution

The parameterization corresponds to the proportional odds model in the logistic case. The model assumes that the endogenous variable is ordered but that the labels have no numeric interpretation besides the ordering.

The model is based on a latent linear variable, where we observe only a discretization.

y_latent = X beta + u

The observed variable is defined by the interval

y = {0 if y_latent <= cut_0

1 of cut_0 < y_latent <= cut_1 … K if cut_K < y_latent

The probability of observing y=k conditional on the explanatory variables X is given by

prob(y = k | x) = Prob(cut_k < y_latent <= cut_k+1)

= Prob(cut_k - x beta < u <= cut_k+1 - x beta = F(cut_k+1 - x beta) - F(cut_k - x beta)

Where F is the cumulative distribution of u which is either the normal or the logistic distribution, but can be set to any other continuous distribution. We use standardized distributions to avoid identifiability problems.

Parameters:
endogarray_like

Endogenous or dependent ordered categorical variable with k levels. Labels or values of endog will internally transformed to consecutive integers, 0, 1, 2, … pd.Series with ordered Categorical as dtype should be preferred as it gives the order relation between the levels. If endog is not a pandas Categorical, then categories are sorted in lexicographic order (by numpy.unique).

exogarray_like

Exogenous, explanatory variables. This should not include an intercept. pd.DataFrame are also accepted. see Notes about constant when using formulas

offsetarray_like

Offset is added to the linear prediction with coefficient equal to 1.

distrstr ‘probit’ or ‘logit’, or a distribution instance

The default is currently ‘probit’ which uses the normal distribution and corresponds to an ordered Probit model. The distribution is assumed to have the main methods of scipy.stats distributions, mainly cdf, pdf and ppf. The inverse cdf, ppf, is only use to calculate starting values.

Notes

Status: experimental, core results are verified, still subclasses GenericLikelihoodModel which will change in future versions.

The parameterization of OrderedModel requires that there is no constant in the model, neither explicit nor implicit. The constant is equivalent to shifting all thresholds and is therefore not separately identified.

Patsy’s formula specification does not allow a design matrix without explicit or implicit constant if there are categorical variables (or maybe splines) among explanatory variables. As workaround, statsmodels removes an explicit intercept.

Consequently, there are two valid cases to get a design matrix without intercept when using formulas:

  • specify a model without explicit and implicit intercept which is possible if there are only numerical variables in the model.

  • specify a model with an explicit intercept which statsmodels will remove.

Models with an implicit intercept will be overparameterized, the parameter estimates will not be fully identified, cov_params will not be invertible and standard errors might contain nans. The computed results will be dominated by numerical imprecision coming mainly from convergence tolerance and numerical derivatives.

The model will raise a ValueError if a remaining constant is detected.

Attributes:
endog_names

Names of endogenous variables.

exog_names

Names of exogenous variables.

start_params

Start parameters for the optimization corresponding to null model.

Methods

cdf(x)

Cdf evaluated at x.

expandparams(params)

expand to full parameter array when some parameters are fixed

fit([start_params, method, maxiter, ...])

Fit method for likelihood based models

from_formula(formula, data[, subset, drop_cols])

Create a Model from a formula and dataframe.

hessian(params)

Hessian of log-likelihood evaluated at params

hessian_factor(params[, scale, observed])

Weights for calculating Hessian

information(params)

Fisher information matrix of model.

initialize()

Initialize (possibly re-initialize) a Model instance.

loglike(params)

Log-likelihood of model at params

loglikeobs(params)

Log-likelihood of OrderdModel for all observations.

nloglike(params)

Negative log-likelihood of model at params

pdf(x)

Pdf evaluated at x

predict(params[, exog, offset, which])

Predicted probabilities for each level of the ordinal endog.

prob(low, upp)

Interval probability.

reduceparams(params)

Reduce parameters

score(params)

Gradient of log-likelihood evaluated at params

score_obs(params, **kwds)

Jacobian/Gradient of log-likelihood evaluated at params for each observation.

score_obs_(params)

score, first derivative of loglike for each observations

transform_reverse_threshold_params(params)

obtain transformed thresholds from original thresholds or cutoffs

transform_threshold_params(params)

transformation of the parameters in the optimization

Properties

endog_names

Names of endogenous variables.

exog_names

Names of exogenous variables.

start_params

Start parameters for the optimization corresponding to null model.


Last update: Dec 14, 2023