Statistics and inference for one and two sample Poisson rates#

Author: Josef Perktold

This notebook provides a brief overview of hypothesis tests, confidence intervals and other statistics for Poisson rates in one and two sample case. See docstrings for more options and additional details.

All functions in statsmodels.stats.rates take summary statistics of the data as arguments. Those are counts of events and number of observations or total exposure. Some functions for Poisson have an option for excess dispersion. Functions for negative binomial, NB2, require the dispersion parameter. Excess dispersion and dispersion parameter need to be provided by the user and can be estimated from the original data with GLM-Poisson and discrete NegativeBinomial model, respectively.

Note, some parts are still experimental and will likely change, some features are still missing and will be added in future versions.

One sample functions

Two sample functions

[1]:

from statsmodels.stats.rates import (  # functions for 1 sample; functions for 2 sample; power functions; list of statistical methods
    confint_poisson,
    confint_poisson_2indep,
    confint_quantile_poisson,
    method_names_poisson_1samp,
    method_names_poisson_2indep,
    nonequivalence_poisson_2indep,
    power_equivalence_neginb_2indep,
    power_equivalence_poisson_2indep,
    power_negbin_ratio_2indep,
    power_poisson_diff_2indep,
    power_poisson_ratio_2indep,
    test_poisson,
    test_poisson_2indep,
    tolerance_int_poisson,
    tost_poisson_2indep,
)

One sample functions#

The main functions for one sample Poisson rates currently are test_poisson and confint_poisson. Both have several methods available, most of them are consistent between hypothesis test and confidence interval. Two additional functions are available for tolerance intervals and for confidence intervals of quantiles.

See docstrings for details.

[2]:

count1, n1 = 60, 514.775
count1 / n1

[2]:

0.11655577679568745

[3]:

test_poisson(count1, n1, value=0.1, method="midp-c")

[3]:

<class 'statsmodels.stats.base.HolderTuple'>
statistic = nan
pvalue = np.float64(0.23913820865664664)
distribution = 'Poisson'
method = 'midp-c'
alternative = 'two-sided'
rate = 0.11655577679568745
nobs = 514.775
tuple = (nan, np.float64(0.23913820865664664))

[4]:

confint_poisson(count1, n1, method="midp-c")

[4]:

(np.float64(0.0897357524941493), np.float64(0.1490015282355224))

The available methods for hypothesis tests and confidence interval are available in the dictionary method_names_poisson_1samp. See docstring for details.

[5]:

method_names_poisson_1samp

[5]:

{'test': ['wald',
  'score',
  'exact-c',
  'midp-c',
  'waldccv',
  'sqrt-a',
  'sqrt-v',
  'sqrt'],
 'confint': ['wald',
  'score',
  'exact-c',
  'midp-c',
  'jeff',
  'waldccv',
  'sqrt-a',
  'sqrt-v',
  'sqrt',
  'sqrt-cent',
  'sqrt-centcc']}

[6]:

for meth in method_names_poisson_1samp["test"]:
    tst = test_poisson(count1, n1, method=meth, value=0.1, alternative="two-sided")
    print(f"{meth:<12s}", tst.pvalue)

wald         0.2712232025335152
score        0.23489608509894766
exact-c      0.2654698417416039
midp-c       0.23913820865664664
waldccv      0.27321266612309003
sqrt-a       0.25489746088635834
sqrt-v       0.2281700763432699
sqrt         0.2533006997208508

[7]:

for meth in method_names_poisson_1samp["confint"]:
    tst = confint_poisson(count1, n1, method=meth)
    print(f"{meth:<12s}", tst)

wald         (np.float64(0.08706363801159746), np.float64(0.14604791557977745))
score        (np.float64(0.0905597500576385), np.float64(0.15001420714831387))
exact-c      (np.float64(0.08894433674907924), np.float64(0.15003038882355074))
midp-c       (np.float64(0.0897357524941493), np.float64(0.1490015282355224))
jeff         (np.float64(0.08979284758964944), np.float64(0.14893677466593855))
waldccv      (np.float64(0.08694100904696915), np.float64(0.14617054454440576))
sqrt-a       (np.float64(0.08883721953786133), np.float64(0.14800553586080228))
sqrt-v       (np.float64(0.08975547672311084), np.float64(0.14897854470462502))
sqrt         (np.float64(0.08892923891524183), np.float64(0.14791351648342183))
sqrt-cent    (np.float64(0.08883721953786133), np.float64(0.1480055358608023))
sqrt-centcc  (np.float64(0.0879886777703761), np.float64(0.1490990831089978))

Two additional functions are currently available for one sample poisson rates, tolerance_int_poisson for tolerance intervals and confint_quantile_poisson for confidence intervals of Poisson quantiles.

Tolerance intervals are similar to prediction intervals that combine the randomness of a new observation and uncertainty about the estimated Poisson rate. If the rate were known, then we can compute a Poisson interval for a new observation using the inverse cdf at the given rate. The tolerance interval adds uncertainty about the rate by using the confidence interval for the rate estimate.

A tolerance interval is specified by two probabilities, prob is the coverage of the Poisson interval, alpha is the confidence level for the confidence interval of the rate estimate.

Note, that probabilities cannot be exactly equal to the nominal probabilites because counts are discrete random variables. The properties of the intervals are specified in term of inequalities, coverage is at least prob, coverage of the confidence interval of the estimated rate is at least 1 - alpha. However, most methods will not guarantee that the coverage inequalities hold in small samples even if the distribution is correctly specified.

In the following example, we can expect to observe between 4 and 23 events if the total exposure or number of observations is 100, at given coverage prob and confidence level alpha. The tolerance interval is larger than the Poisson interval at the observed rate, (5, 19), because the tolerance interval takes uncertainty about the parameter estimate into account.

[8]:

exposure_new = 100
tolerance_int_poisson(
    count1,
    n1,
    prob=0.95,
    exposure_new=exposure_new,
    method="score",
    alpha=0.05,
    alternative="two-sided",
)

[8]:

(np.float64(4.0), np.float64(23.0))

[9]:

from scipy import stats

stats.poisson.interval(0.95, count1 / n1 * exposure_new)

[9]:

(np.float64(5.0), np.float64(19.0))

Aside: We can force the tolerance interval to ignore parameter uncertainty by specifying alpha=1.

[10]:

tolerance_int_poisson(
    count1,
    n1,
    prob=0.95,
    exposure_new=exposure_new,
    method="score",
    alpha=1,
    alternative="two-sided",
)

[10]:

(np.float64(5.0), np.float64(19.0))

The last function returns a confidence interval for a Poisson quantile. A quantile is the inverse of the cdf function, named ppf in scipy.stats distributions.

The following example shows the confidence interval for the upper bound of the Poisson interval at cdf probability 0.975. The upper confidence limit using the one-tail coverage probability is the same as the upper limit of the tolerance interval.

[11]:

confint_quantile_poisson(
    count1,
    n1,
    prob=0.975,
    exposure_new=100,
    method="score",
    alpha=0.05,
    alternative="two-sided",
)

[11]:

(np.float64(15.0), np.float64(23.0))

Two sample functions#

Statistical function for two samples can compare the rates by either the ratio or the difference. Default is comparing the rates ratio.

The etest functions can be directly accessed through test_poisson_2indep.

[12]:

count1, n1, count2, n2 = 60, 514.775, 30, 543.087

[13]:

test_poisson_2indep(count1, n1, count2, n2, method="etest-score")

[13]:

<class 'statsmodels.stats.base.HolderTuple'>
statistic = np.float64(3.4174018390002145)
pvalue = np.float64(0.0005672617581628009)
distribution = 'poisson'
compare = 'ratio'
method = 'etest-score'
alternative = 'two-sided'
rates = (np.float64(0.11655577679568745), np.float64(0.055239768213932575))
ratio = np.float64(2.10999757175465)
diff = np.float64(0.06131600858175487)
value = 1
rates_cmle = None
tuple = (np.float64(3.4174018390002145), np.float64(0.0005672617581628009))

[14]:

confint_poisson_2indep(count1, n1, count2, n2, method="score", compare="ratio")

[14]:

(np.float64(1.3659624311981189), np.float64(3.2593061483872257))

[15]:

confint_poisson_2indep(count1, n1, count2, n2, method="score", compare="diff")

[15]:

(np.float64(0.026579645509259224), np.float64(0.0989192191413259))

The two sample test function, test_poisson_2indep, has a value option to specify null hypothesis that do not specify equality. This is useful for superiority and noninferiority testing with one-sided alternatives.

As an example, the following test tests the two-sided null hypothesis that the rates ratio is 2. The pvalue for this hypothesis is 0.81 and we cannot reject that the first rate is twice the second rate.

[16]:

test_poisson_2indep(count1, n1, count2, n2, value=2, method="etest-score")

[16]:

<class 'statsmodels.stats.base.HolderTuple'>
statistic = np.float64(0.23946504079843253)
pvalue = np.float64(0.8135048572056751)
distribution = 'poisson'
compare = 'ratio'
method = 'etest-score'
alternative = 'two-sided'
rates = (np.float64(0.11655577679568745), np.float64(0.055239768213932575))
ratio = np.float64(2.10999757175465)
diff = np.float64(0.06131600858175487)
value = 2
rates_cmle = None
tuple = (np.float64(0.23946504079843253), np.float64(0.8135048572056751))

The method_names_poisson_2indep dictionary shows which methods are available when comparing two samples by either rates ratio or rates difference.

We can use the dictionary to compute p-values and confidence intervals using all available methods.

[17]:

method_names_poisson_2indep

[17]:

{'test': {'ratio': ['wald',
   'score',
   'score-log',
   'wald-log',
   'exact-cond',
   'cond-midp',
   'sqrt',
   'etest-score',
   'etest-wald'],
  'diff': ['wald', 'score', 'waldccv', 'etest-score', 'etest-wald']},
 'confint': {'ratio': ['waldcc',
   'score',
   'score-log',
   'wald-log',
   'sqrtcc',
   'mover'],
  'diff': ['wald', 'score', 'waldccv', 'mover']}}

[18]:

for compare in ["ratio", "diff"]:
    print(compare)
    for meth in method_names_poisson_2indep["test"][compare]:
        tst = test_poisson_2indep(
            count1,
            n1,
            count2,
            n2,
            value=None,
            method=meth,
            compare=compare,
            alternative="two-sided",
        )
        print(f"   {meth:<12s}", tst.pvalue)

ratio
   wald         0.0007120093285061108
   score        0.0006322188820470972
   score-log    0.0003992519661848979
   wald-log     0.0008399438093390379
   exact-cond   0.0006751826586863219
   cond-midp    0.0005572624066190538
   sqrt         0.0005700355621795108
   etest-score  0.0005672617581628009
   etest-wald   0.0006431446124897875
diff
   wald         0.0007120093285061094
   score        0.0006322188820470944
   waldccv      0.0007610462660136599
   etest-score  0.000567261758162795
   etest-wald   0.0006431446124897808

In a similar way we can compute confidence intervals for the rate ratio and rate difference for all currently available methods.

[19]:

for compare in ["ratio", "diff"]:
    print(compare)
    for meth in method_names_poisson_2indep["confint"][compare]:
        ci = confint_poisson_2indep(
            count1, n1, count2, n2, method=meth, compare=compare
        )
        print(f"   {meth:<12s}", ci)

ratio
   waldcc       (np.float64(1.354190544703406), np.float64(3.233964238781885))
   score        (np.float64(1.3659624311981189), np.float64(3.2593061483872257))
   score-log    (np.float64(1.3903411228996467), np.float64(3.4348249508085043))
   wald-log     (np.float64(1.3612801263025065), np.float64(3.2705169691290763))
   sqrtcc       (np.float64(1.29635711135392), np.float64(3.132234781692197))
   mover        (np.float64(1.3614682485833316), np.float64(3.258622814678696))
diff
   wald         (np.float64(0.02581223514639487), np.float64(0.09681978201711487))
   score        (np.float64(0.026579645509259224), np.float64(0.0989192191413259))
   waldccv      (np.float64(0.025618973109117968), np.float64(0.09701304405439178))
   mover        (np.float64(0.026193641039269785), np.float64(0.09864127183950336))

We have two additional functions for hypothesis tests that specify interval hypothesis, tost_poisson_2indep and nonequivalence_poisson_2indep.

The TOST function implements equivalence tests where the alternative hypothesis specifies that the two rates are within an interval of each other.

The nonequivalence tests implements a test where the alternative specifies that the two rates differ by at least a given nonzero value. This is also often called a minimum effect test. This test uses two one-sided tests similar to TOST however with null and alternative hypothesis reversed compared to the equivalence test.

Both functions delegate to test_poisson_2indep and, therefore, the same method options are available.

The following equivalence test specifies the alternative hypothesis that the rate ratio is between 0.8 and 1/0.8. The observed rate ratio is 0.89. The pvalue is 0.107 and we cannot reject the null hypothesis in favor of the alternative hypothesis that the two rates are equivalent at the given margins. Thus the hypothesis test does not provide evidence that the two rates are equivalent.

In the second example we test equivalence in the rate difference, where equivalence is defined by margins (-0.04, 0.04). The pvalue is around 0.2 and the test does not support that the two rates are equivalent.

[20]:

low = 0.8
upp = 1 / low

count1, n1, count2, n2 = 200, 1000, 450, 2000

tost_poisson_2indep(count1, n1, count2, n2, low, upp, method="score", compare="ratio")

[20]:

<class 'statsmodels.stats.base.HolderTuple'>
statistic = np.float64(1.2403473458920846)
pvalue = np.float64(0.10742347370282446)
method = 'score'
compare = 'ratio'
equiv_limits = (0.8, 1.25)
results_larger = <class 'statsmodels.stats.base.HolderTuple'>
    statistic = np.float64(1.2403473458920846)
    pvalue = np.float64(0.10742347370282446)
    distribution = 'normal'
    compare = 'ratio'
    method = 'score'
    alternative = 'larger'
    rates = (np.float64(0.2), np.float64(0.225))
    ratio = np.float64(0.888888888888889)
    diff = np.float64(-0.024999999999999994)
    value = 0.8
    rates_cmle = None
    tuple = (np.float64(1.2403473458920846), np.float64(0.10742347370282446))
results_smaller = <class 'statsmodels.stats.base.HolderTuple'>
    statistic = np.float64(-4.0311288741492755)
    pvalue = np.float64(2.7754797240370253e-05)
    distribution = 'normal'
    compare = 'ratio'
    method = 'score'
    alternative = 'smaller'
    rates = (np.float64(0.2), np.float64(0.225))
    ratio = np.float64(0.888888888888889)
    diff = np.float64(-0.024999999999999994)
    value = 1.25
    rates_cmle = None
    tuple = (np.float64(-4.0311288741492755), np.float64(2.7754797240370253e-05))
title = 'Equivalence test for 2 independent Poisson rates'
tuple = (np.float64(1.2403473458920846), np.float64(0.10742347370282446))

[21]:

upp = 0.04
low = -upp
tost_poisson_2indep(count1, n1, count2, n2, low, upp, method="score", compare="diff")

[21]:

<class 'statsmodels.stats.base.HolderTuple'>
statistic = np.float64(0.8575203124598336)
pvalue = np.float64(0.19557869693808477)
method = 'score'
compare = 'diff'
equiv_limits = (-0.04, 0.04)
results_larger = <class 'statsmodels.stats.base.HolderTuple'>
    statistic = np.float64(0.8575203124598336)
    pvalue = np.float64(0.19557869693808477)
    distribution = 'normal'
    compare = 'diff'
    method = 'score'
    alternative = 'larger'
    rates = (np.float64(0.2), np.float64(0.225))
    ratio = np.float64(0.888888888888889)
    diff = np.float64(-0.024999999999999994)
    value = -0.04
    rates_cmle = (np.float64(0.19065363652113884), np.float64(0.23065363652113885))
    tuple = (np.float64(0.8575203124598336), np.float64(0.19557869693808477))
results_smaller = <class 'statsmodels.stats.base.HolderTuple'>
    statistic = np.float64(-3.4807277010355238)
    pvalue = np.float64(0.00025002679047994814)
    distribution = 'normal'
    compare = 'diff'
    method = 'score'
    alternative = 'smaller'
    rates = (np.float64(0.2), np.float64(0.225))
    ratio = np.float64(0.888888888888889)
    diff = np.float64(-0.024999999999999994)
    value = 0.04
    rates_cmle = (np.float64(0.24581855699051405), np.float64(0.20581855699051405))
    tuple = (np.float64(-3.4807277010355238), np.float64(0.00025002679047994814))
title = 'Equivalence test for 2 independent Poisson rates'
tuple = (np.float64(0.8575203124598336), np.float64(0.19557869693808477))

The function nonequivalence_poisson_2indep tests the alternative hypothesis that the two rates differ by a non-neglibile amount.

In the following example, the alternative hypothesis specifies that the rate ratio is outside the interval (0.95, 1/0.95). The null hypothesis is that the ratio ratio is in the interval. If the test rejects the null hypothesis, then it provides evidence that the rate ratio differ by more than the unimportant amount specified by the interval limits.

A note on the relationship between point hypothesis test and interval hypothesis test in large samples. The point null hypothesis of test_poisson_2indep will reject any small deviation from the null hypothesis if the null hypothesis does not hold exactly and the sample size is large enough. The nonequivalence or minimum effect test will not reject the null hypothesis in large samples (sample approaches infinite) if rates differ by not more than the specified neglibible amount.

In the example neither the point nor the interval null hypothesis are rejected. We do not have enough evidence to say that the rates are statistically different. Following that, we increase the sample size 20 times while keeping observed rates constant. In this case, the point null hypothesis test is rejected, the pvalue is 0.01, while the interval null hypothesis is not rejected, the pvalue is equal to 1.

Note: The nonequivalence test is in general conservative, its size is bounded by alpha, but in the large sample limit with fixed nonequivalence margins the size approaches alpha / 2. If the nonequivalence interval shrinks to a single point, then the nonequivalence test is the same as the point hypothesis test. (see docstring)

[22]:

count1, n1, count2, n2 = 200, 1000, 420, 2000
low = 0.95
upp = 1 / low
nf = 1
nonequivalence_poisson_2indep(
    count1 * nf,
    n1 * nf,
    count2 * nf,
    n2 * nf,
    low,
    upp,
    method="score",
    compare="ratio",
)

[22]:

<class 'statsmodels.stats.base.HolderTuple'>
statistic = np.float64(-1.1654330934961301)
pvalue = np.float64(1.0232437381644721)
method = 'score'
results_larger = <class 'statsmodels.stats.base.HolderTuple'>
    statistic = np.float64(0.02913582733740325)
    pvalue = np.float64(0.5116218690822361)
    distribution = 'normal'
    compare = 'ratio'
    method = 'score'
    alternative = 'smaller'
    rates = (np.float64(0.2), np.float64(0.21))
    ratio = np.float64(0.9523809523809524)
    diff = np.float64(-0.009999999999999981)
    value = 0.95
    rates_cmle = None
    tuple = (np.float64(0.02913582733740325), np.float64(0.5116218690822361))
results_smaller = <class 'statsmodels.stats.base.HolderTuple'>
    statistic = np.float64(-1.1654330934961301)
    pvalue = np.float64(0.8780781359377093)
    distribution = 'normal'
    compare = 'ratio'
    method = 'score'
    alternative = 'larger'
    rates = (np.float64(0.2), np.float64(0.21))
    ratio = np.float64(0.9523809523809524)
    diff = np.float64(-0.009999999999999981)
    value = 1.0526315789473684
    rates_cmle = None
    tuple = (np.float64(-1.1654330934961301), np.float64(0.8780781359377093))
title = 'Equivalence test for 2 independent Poisson rates'
tuple = (np.float64(-1.1654330934961301), np.float64(1.0232437381644721))

[23]:

test_poisson_2indep(
    count1 * nf, n1 * nf, count2 * nf, n2 * nf, method="score", compare="ratio"
)

[23]:

<class 'statsmodels.stats.base.HolderTuple'>
statistic = np.float64(-0.5679618342470648)
pvalue = np.float64(0.5700608835629815)
distribution = 'normal'
compare = 'ratio'
method = 'score'
alternative = 'two-sided'
rates = (np.float64(0.2), np.float64(0.21))
ratio = np.float64(0.9523809523809524)
diff = np.float64(-0.009999999999999981)
value = 1
rates_cmle = None
tuple = (np.float64(-0.5679618342470648), np.float64(0.5700608835629815))

[24]:

nf = 20
nonequivalence_poisson_2indep(
    count1 * nf,
    n1 * nf,
    count2 * nf,
    n2 * nf,
    low,
    upp,
    method="score",
    compare="ratio",
).pvalue

[24]:

np.float64(1.1036704302254083)

[25]:

test_poisson_2indep(
    count1 * nf, n1 * nf, count2 * nf, n2 * nf, method="score", compare="ratio"
).pvalue

[25]:

np.float64(0.01108516638060269)

Power#

Statsmodels has limited support for computing statistical power for the comparison of 2 sample Poisson and Negative Binomial rates. Those are based on Zhu and Lakkis and Zhu for ratio comparisons for both distributions, and basic normal based comparison for the Poisson rate difference. Other methods that correspond more closely to the available methods in the hypothesis test function, especially Gu, are not yet available.

The available functions are

[26]:

print(power_poisson_ratio_2indep)
print(power_equivalence_poisson_2indep)
print(power_negbin_ratio_2indep)
print(power_equivalence_neginb_2indep)

power_poisson_diff_2indep

<function power_poisson_ratio_2indep at 0x7fdba3f95c70>
<function power_equivalence_poisson_2indep at 0x7fdba3f95d20>
<function power_negbin_ratio_2indep at 0x7fdba3f96140>
<function power_equivalence_neginb_2indep at 0x7fdba3f961f0>

[26]:

<function statsmodels.stats.rates.power_poisson_diff_2indep(rate1, rate2, nobs1, nobs_ratio=1, alpha=0.05, value=0, method_var='score', alternative='two-sided', return_results=True)>