statsmodels.stats.gof.gof_binning_discrete

statsmodels.stats.gof.gof_binning_discrete(rvs, distfn, arg, nsupp=20)[source]

get bins for chisquare type gof tests for a discrete distribution

Parameters
rvsarray

sample data

distnamestring

name of distribution function

argsequence

parameters of distribution

nsuppinteger

number of bins. The algorithm tries to find bins with equal weights. depending on the distribution, the actual number of bins can be smaller.

Returns
freqarray

empirical frequencies for sample; not normalized, adds up to sample size

expfreqarray

theoretical frequencies according to distribution

histsupparray

bin boundaries for histogram, (added 1e-8 for numerical robustness)

Notes

The results can be used for a chisquare test

(chis,pval) = stats.chisquare(freq, expfreq)

originally written for scipy.stats test suite, still needs to be checked for standalone usage, insufficient input checking may not run yet (after copy/paste)

refactor: maybe a class, check returns, or separate binning from

test results

todo :

optimal number of bins ? (check easyfit), recommendation in literature at least 5 expected observations in each bin