statsmodels.stats.proportion.multinomial_proportions_confint¶

statsmodels.stats.proportion.
multinomial_proportions_confint
(counts, alpha=0.05, method='goodman')[source]¶ Confidence intervals for multinomial proportions.
Parameters:  counts (array_like of int, 1D) – Number of observations in each category.
 alpha (float in (0, 1), optional) – Significance level, defaults to 0.05.
 method ({'goodman', 'sisonglaz'}, optional) –
Method to use to compute the confidence intervals; available methods are:
Returns: confint – Array of [lower, upper] confidence levels for each category, such that overall coverage is (approximately) 1alpha.
Return type: ndarray, 2D
Raises: ValueError
– If alpha is not in (0, 1) (bounds excluded), or if the values in counts are not all positive or null.NotImplementedError
– If method is not kown.Exception
– Whenmethod == 'sisonglaz'
, if for some reason c cannot be computed; this signals a bug and should be reported.
Notes
The goodman method [2] is based on approximating a statistic based on the multinomial as a chisquared random variable. The usual recommendation is that this is valid if all the values in counts are greater than or equal to 5. There is no condition on the number of categories for this method.
The sisonglaz method [3] approximates the multinomial probabilities, and evaluates that with a maximumlikelihood estimator. The first approximation is an Edgeworth expansion that converges when the number of categories goes to infinity, and the maximumlikelihood estimator converges when the number of observations (
sum(counts)
) goes to infinity. In their paper, Sison & Glaz demo their method with at least 7 categories, solen(counts) >= 7
with all values in counts at or above 5 can be used as a rule of thumb for the validity of this method. This method is less conservative than the goodman method (i.e. it will yield confidence intervals closer to the desired significance level), but produces confidence intervals of uniform width over all categories (except when the intervals reach 0 or 1, in which case they are truncated), which makes it most useful when proportions are of similar magnitude.Aside from the original sources ([1], [2], and [3]), the implementation uses the formulas (though not the code) presented in [4] and [5].
References
[1] Levin, Bruce, “A representation for multinomial cumulative distribution functions,” The Annals of Statistics, Vol. 9, No. 5, 1981, pp. 11231126. [2] (1, 2, 3) Goodman, L.A., “On simultaneous confidence intervals for multinomial proportions,” Technometrics, Vol. 7, No. 2, 1965, pp. 247254. [3] (1, 2, 3) Sison, Cristina P., and Joseph Glaz, “Simultaneous Confidence Intervals and Sample Size Determination for Multinomial Proportions,” Journal of the American Statistical Association, Vol. 90, No. 429, 1995, pp. 366369. [4] May, Warren L., and William D. Johnson, “A SAS® macro for constructing simultaneous confidence intervals for multinomial proportions,” Computer methods and programs in Biomedicine, Vol. 53, No. 3, 1997, pp. 153162. [5] May, Warren L., and William D. Johnson, “Constructing twosided simultaneous confidence intervals for multinomial proportions for small counts in a large number of cells,” Journal of Statistical Software, Vol. 5, No. 6, 2000, pp. 124.