statsmodels.stats.proportion.multinomial_proportions_confint

statsmodels.stats.proportion.multinomial_proportions_confint(counts, alpha=0.05, method='goodman')[source]

Confidence intervals for multinomial proportions.

Parameters:

counts : array_like of int, 1-D

Number of observations in each category.

alpha : float in (0, 1), optional

Significance level, defaults to 0.05.

method : {‘goodman’, ‘sison-glaz’}, optional

Method to use to compute the confidence intervals; available methods are:

  • goodman: based on a chi-squared approximation, valid if all values in counts are greater or equal to 5 [R53]
  • sison-glaz: less conservative than goodman, but only valid if counts has 7 or more categories (len(counts) >= 7) [R54]
Returns:

confint : ndarray, 2-D

Array of [lower, upper] confidence levels for each category, such that overall coverage is (approximately) 1-alpha.

Raises:

ValueError

If alpha is not in (0, 1) (bounds excluded), or if the values in counts are not all positive or null.

NotImplementedError

If method is not kown.

Exception

When method == 'sison-glaz', if for some reason c cannot be computed; this signals a bug and should be reported.

Notes

The goodman method [R53] is based on approximating a statistic based on the multinomial as a chi-squared random variable. The usual recommendation is that this is valid if all the values in counts are greater than or equal to 5. There is no condition on the number of categories for this method.

The sison-glaz method [R54] approximates the multinomial probabilities, and evaluates that with a maximum-likelihood estimator. The first approximation is an Edgeworth expansion that converges when the number of categories goes to infinity, and the maximum-likelihood estimator converges when the number of observations (sum(counts)) goes to infinity. In their paper, Sison & Glaz demo their method with at least 7 categories, so len(counts) >= 7 with all values in counts at or above 5 can be used as a rule of thumb for the validity of this method. This method is less conservative than the goodman method (i.e. it will yield confidence intervals closer to the desired significance level), but produces confidence intervals of uniform width over all categories (except when the intervals reach 0 or 1, in which case they are truncated), which makes it most useful when proportions are of similar magnitude.

Aside from the original sources ([R52], [R53], and [R54]), the implementation uses the formulas (though not the code) presented in [R55] and [R56].

References

[R52](1, 2) Levin, Bruce, “A representation for multinomial cumulative distribution functions,” The Annals of Statistics, Vol. 9, No. 5, 1981, pp. 1123-1126.
[R53](1, 2, 3, 4) Goodman, L.A., “On simultaneous confidence intervals for multinomial proportions,” Technometrics, Vol. 7, No. 2, 1965, pp. 247-254.
[R54](1, 2, 3, 4) Sison, Cristina P., and Joseph Glaz, “Simultaneous Confidence Intervals and Sample Size Determination for Multinomial Proportions,” Journal of the American Statistical Association, Vol. 90, No. 429, 1995, pp. 366-369.
[R55](1, 2) May, Warren L., and William D. Johnson, “A SAS® macro for constructing simultaneous confidence intervals for multinomial proportions,” Computer methods and programs in Biomedicine, Vol. 53, No. 3, 1997, pp. 153-162.
[R56](1, 2) May, Warren L., and William D. Johnson, “Constructing two-sided simultaneous confidence intervals for multinomial proportions for small counts in a large number of cells,” Journal of Statistical Software, Vol. 5, No. 6, 2000, pp. 1-24.