Statistics and inference for one and two sample Poisson rates

Author: Josef Perktold

This notebook provides a brief overview of hypothesis tests, confidence intervals and other statistics for Poisson rates in one and two sample case. See docstrings for more options and additional details.

All functions in statsmodels.stats.rates take summary statistics of the data as arguments. Those are counts of events and number of observations or total exposure. Some functions for Poisson have an option for excess dispersion. Functions for negative binomial, NB2, require the dispersion parameter. Excess dispersion and dispersion parameter need to be provided by the user and can be estimated from the original data with GLM-Poisson and discrete NegativeBinomial model, respectively.

Note, some parts are still experimental and will likely change, some features are still missing and will be added in future versions.

[1]:
import numpy as np
from numpy.testing import assert_allclose
import statsmodels.stats.rates as smr
from statsmodels.stats.rates import (
    # functions for 1 sample
    test_poisson,
    confint_poisson,
    tolerance_int_poisson,
    confint_quantile_poisson,

    # functions for 2 sample
    test_poisson_2indep,
    etest_poisson_2indep,
    confint_poisson_2indep,
    tost_poisson_2indep,
    nonequivalence_poisson_2indep,

    # power functions
    power_poisson_ratio_2indep,
    power_poisson_diff_2indep,
    power_equivalence_poisson_2indep,
    power_negbin_ratio_2indep,
    power_equivalence_neginb_2indep,

    # list of statistical methods
    method_names_poisson_1samp,
    method_names_poisson_2indep,
    )

One sample functions

The main functions for one sample Poisson rates currently are test_poisson and confint_poisson. Both have several methods available, most of them are consistent between hypothesis test and confidence interval. Two additional functions are available for tolerance intervals and for confidence intervals of quantiles.
See docstrings for details.
[2]:
count1, n1 = 60, 514.775
count1 / n1
[2]:
0.11655577679568745
[3]:
test_poisson(count1, n1, value=0.1, method="midp-c")
[3]:
<class 'statsmodels.stats.base.HolderTuple'>
statistic = nan
pvalue = 0.23913820865664664
distribution = 'Poisson'
method = 'midp-c'
alternative = 'two-sided'
rate = 0.11655577679568745
nobs = 514.775
tuple = (nan, 0.23913820865664664)
[4]:
confint_poisson(count1, n1, method="midp-c")
[4]:
(0.0897357524941493, 0.1490015282355224)

The available methods for hypothesis tests and confidence interval are available in the dictionary method_names_poisson_1samp. See docstring for details.

[5]:
method_names_poisson_1samp
[5]:
{'test': ['wald',
  'score',
  'exact-c',
  'midp-c',
  'waldccv',
  'sqrt-a',
  'sqrt-v',
  'sqrt'],
 'confint': ['wald',
  'score',
  'exact-c',
  'midp-c',
  'jeff',
  'waldccv',
  'sqrt-a',
  'sqrt-v',
  'sqrt',
  'sqrt-cent',
  'sqrt-centcc']}
[6]:
for meth in method_names_poisson_1samp["test"]:
    tst = test_poisson(count1, n1, method=meth, value=0.1,
                       alternative='two-sided')
    print("%-12s" % meth, tst.pvalue)
wald         0.2712232025335152
score        0.23489608509894766
exact-c      0.2654698417416039
midp-c       0.23913820865664664
waldccv      0.27321266612309003
sqrt-a       0.25489746088635834
sqrt-v       0.2281700763432699
sqrt         0.2533006997208508
[7]:
for meth in method_names_poisson_1samp["confint"]:
    tst = confint_poisson(count1, n1, method=meth)
    print("%-12s" % meth, tst)
wald         (0.08706363801159746, 0.14604791557977745)
score        (0.0905597500576385, 0.15001420714831387)
exact-c      (0.08894433674907924, 0.15003038882355074)
midp-c       (0.0897357524941493, 0.1490015282355224)
jeff         (0.08979284758964944, 0.14893677466593855)
waldccv      (0.08694100904696915, 0.14617054454440576)
sqrt-a       (0.08883721953786133, 0.14800553586080228)
sqrt-v       (0.08975547672311084, 0.14897854470462502)
sqrt         (0.08892923891524183, 0.14791351648342183)
sqrt-cent    (0.08883721953786133, 0.1480055358608023)
sqrt-centcc  (0.0879886777703761, 0.1490990831089978)

Two additional functions are currently available for one sample poisson rates, tolerance_int_poisson for tolerance intervals and confint_quantile_poisson for confidence intervals of Poisson quantiles.

Tolerance intervals are similar to prediction intervals that combine the randomness of a new observation and uncertainty about the estimated Poisson rate. If the rate were known, then we can compute a Poisson interval for a new observation using the inverse cdf at the given rate. The tolerance interval adds uncertainty about the rate by using the confidence interval for the rate estimate.

A tolerance interval is specified by two probabilities, prob is the coverage of the Poisson interval, alpha is the confidence level for the confidence interval of the rate estimate.
Note, that probabilities cannot be exactly equal to the nominal probabilites because counts are discrete random variables. The properties of the intervals are specified in term of inequalities, coverage is at least prob, coverage of the confidence interval of the estimated rate is at least 1 - alpha. However, most methods will not guarantee that the coverage inequalities hold in small samples even if the distribution is correctly specified.

In the following example, we can expect to observe between 4 and 23 events if the total exposure or number of observations is 100, at given coverage prob and confidence level alpha. The tolerance interval is larger than the Poisson interval at the observed rate, (5, 19), because the tolerance interval takes uncertainty about the parameter estimate into account.

[8]:
exposure_new = 100
tolerance_int_poisson(count1, n1, prob=0.95, exposure_new=exposure_new, method="score", alpha=0.05, alternative='two-sided')
[8]:
(4.0, 23.0)
[9]:
from scipy import stats
stats.poisson.interval(0.95, count1 / n1 * exposure_new)
[9]:
(5.0, 19.0)

Aside: We can force the tolerance interval to ignore parameter uncertainty by specifying alpha=1.

[10]:
tolerance_int_poisson(count1, n1, prob=0.95, exposure_new=exposure_new, method="score", alpha=1, alternative='two-sided')
[10]:
(5.0, 19.0)

The last function returns a confidence interval for a Poisson quantile. A quantile is the inverse of the cdf function, named ppf in scipy.stats distributions.

The following example shows the confidence interval for the upper bound of the Poisson interval at cdf probability 0.975. The upper confidence limit using the one-tail coverage probability is the same as the upper limit of the tolerance interval.

[11]:
confint_quantile_poisson(count1, n1, prob=0.975, exposure_new=100, method="score", alpha=0.05, alternative='two-sided')
[11]:
(15.0, 23.0)

Two sample functions

Statistical function for two samples can compare the rates by either the ratio or the difference. Default is comparing the rates ratio.

The etest functions can be directly accessed through test_poisson_2indep.

[12]:
count1, n1, count2, n2 = 60, 514.775, 30, 543.087
[13]:
test_poisson_2indep(count1, n1, count2, n2, method='etest-score')
[13]:
<class 'statsmodels.stats.base.HolderTuple'>
statistic = 3.4174018390002145
pvalue = 0.0005672617581628009
distribution = 'poisson'
compare = 'ratio'
method = 'etest-score'
alternative = 'two-sided'
rates = (0.11655577679568745, 0.055239768213932575)
ratio = 2.10999757175465
diff = 0.06131600858175487
value = 1
rates_cmle = None
ratio_null = 1
tuple = (3.4174018390002145, 0.0005672617581628009)
[14]:
confint_poisson_2indep(count1, n1, count2, n2, method='score',
                       compare="ratio")
[14]:
(1.3659624311981189, 3.2593061483872257)
[15]:
confint_poisson_2indep(count1, n1, count2, n2, method='score',
                       compare="diff")
[15]:
(0.026579645509259224, 0.0989192191413259)

The two sample test function, test_poisson_2indep, has a value option to specify null hypothesis that do not specify equality. This is useful for superiority and noninferiority testing with one-sided alternatives.

As an example, the following test tests the two-sided null hypothesis that the rates ratio is 2. The pvalue for this hypothesis is 0.81 and we cannot reject that the first rate is twice the second rate.

[16]:
test_poisson_2indep(count1, n1, count2, n2, value=2, method='etest-score')
[16]:
<class 'statsmodels.stats.base.HolderTuple'>
statistic = 0.23946504079843253
pvalue = 0.813504857205675
distribution = 'poisson'
compare = 'ratio'
method = 'etest-score'
alternative = 'two-sided'
rates = (0.11655577679568745, 0.055239768213932575)
ratio = 2.10999757175465
diff = 0.06131600858175487
value = 2
rates_cmle = None
ratio_null = 2
tuple = (0.23946504079843253, 0.813504857205675)

The method_names_poisson_2indep dictionary shows which methods are available when comparing two samples by either rates ratio or rates difference.

We can use the dictionary to compute p-values and confidence intervals using all available methods.

[17]:
method_names_poisson_2indep
[17]:
{'test': {'ratio': ['wald',
   'score',
   'score-log',
   'wald-log',
   'exact-cond',
   'cond-midp',
   'sqrt',
   'etest-score',
   'etest-wald'],
  'diff': ['wald', 'score', 'waldccv', 'etest-score', 'etest-wald']},
 'confint': {'ratio': ['waldcc',
   'score',
   'score-log',
   'wald-log',
   'sqrtcc',
   'mover'],
  'diff': ['wald', 'score', 'waldccv', 'mover']}}
[18]:
for compare in ["ratio", "diff"]:
    print(compare)
    for meth in method_names_poisson_2indep["test"][compare]:
        tst = test_poisson_2indep(count1, n1, count2, n2, value=None,
                                  method=meth, compare=compare,
                                  alternative='two-sided')
        print("   %-12s" % meth, tst.pvalue)
ratio
   wald         0.0007120093285061108
   score        0.0006322188820470972
   score-log    0.0003992519661848979
   wald-log     0.0008399438093390379
   exact-cond   0.0006751826586863219
   cond-midp    0.0005572624066190538
   sqrt         0.0005700355621795108
   etest-score  0.0005672617581628009
   etest-wald   0.0006431446124897875
diff
   wald         0.0007120093285061094
   score        0.0006322188820470944
   waldccv      0.0007610462660136599
   etest-score  0.000567261758162795
   etest-wald   0.0006431446124897808

In a similar way we can compute confidence intervals for the rate ratio and rate difference for all currently available methods.

[19]:
for compare in ["ratio", "diff"]:
    print(compare)
    for meth in method_names_poisson_2indep["confint"][compare]:
        ci = confint_poisson_2indep(count1, n1, count2, n2,
                                  method=meth, compare=compare)
        print("   %-12s" % meth, ci)
ratio
   waldcc       (1.354190544703406, 3.233964238781885)
   score        (1.3659624311981189, 3.2593061483872257)
   score-log    (1.3903411228996467, 3.4348249508085043)
   wald-log     (1.3612801263025065, 3.2705169691290763)
   sqrtcc       (1.29635711135392, 3.132234781692197)
   mover        (1.3614682485833316, 3.258622814678696)
diff
   wald         (0.02581223514639487, 0.09681978201711487)
   score        (0.026579645509259224, 0.0989192191413259)
   waldccv      (0.025618973109117968, 0.09701304405439178)
   mover        (0.026193641039269785, 0.09864127183950336)

We have two additional functions for hypothesis tests that specify interval hypothesis, tost_poisson_2indep and nonequivalence_poisson_2indep.

The TOST function implements equivalence tests where the alternative hypothesis specifies that the two rates are within an interval of each other.

The nonequivalence tests implements a test where the alternative specifies that the two rates differ by at least a given nonzero value. This is also often called a minimum effect test. This test uses two one-sided tests similar to TOST however with null and alternative hypothesis reversed compared to the equivalence test.

Both functions delegate to test_poisson_2indep and, therefore, the same method options are available.

The following equivalence test specifies the alternative hypothesis that the rate ratio is between 0.8 and 1/0.8. The observed rate ratio is 0.89. The pvalue is 0.107 and we cannot reject the null hypothesis in favor of the alternative hypothesis that the two rates are equivalent at the given margins. Thus the hypothesis test does not provide evidence that the two rates are equivalent.

In the second example we test equivalence in the rate difference, where equivalence is defined by margins (-0.04, 0.04). The pvalue is around 0.2 and the test does not support that the two rates are equivalent.

[20]:

low = 0.8 upp = 1 / low count1, n1, count2, n2 = 200, 1000, 450, 2000 tost_poisson_2indep(count1, n1, count2, n2, low, upp, method='score', compare='ratio')
[20]:
<class 'statsmodels.stats.base.HolderTuple'>
statistic = 1.2403473458920846
pvalue = 0.10742347370282446
method = 'score'
compare = 'ratio'
equiv_limits = (0.8, 1.25)
results_larger = <class 'statsmodels.stats.base.HolderTuple'>
    statistic = 1.2403473458920846
    pvalue = 0.10742347370282446
    distribution = 'normal'
    compare = 'ratio'
    method = 'score'
    alternative = 'larger'
    rates = (0.2, 0.225)
    ratio = 0.888888888888889
    diff = -0.024999999999999994
    value = 0.8
    rates_cmle = None
    ratio_null = 0.8
    tuple = (1.2403473458920846, 0.10742347370282446)
results_smaller = <class 'statsmodels.stats.base.HolderTuple'>
    statistic = -4.0311288741492755
    pvalue = 2.7754797240370253e-05
    distribution = 'normal'
    compare = 'ratio'
    method = 'score'
    alternative = 'smaller'
    rates = (0.2, 0.225)
    ratio = 0.888888888888889
    diff = -0.024999999999999994
    value = 1.25
    rates_cmle = None
    ratio_null = 1.25
    tuple = (-4.0311288741492755, 2.7754797240370253e-05)
title = 'Equivalence test for 2 independent Poisson rates'
tuple = (1.2403473458920846, 0.10742347370282446)
[21]:
upp = 0.04
low = -upp
tost_poisson_2indep(count1, n1, count2, n2, low, upp, method='score', compare='diff')
[21]:
<class 'statsmodels.stats.base.HolderTuple'>
statistic = 0.8575203124598336
pvalue = 0.19557869693808477
method = 'score'
compare = 'diff'
equiv_limits = (-0.04, 0.04)
results_larger = <class 'statsmodels.stats.base.HolderTuple'>
    statistic = 0.8575203124598336
    pvalue = 0.19557869693808477
    distribution = 'normal'
    compare = 'diff'
    method = 'score'
    alternative = 'larger'
    rates = (0.2, 0.225)
    ratio = 0.888888888888889
    diff = -0.024999999999999994
    value = -0.04
    rates_cmle = (0.19065363652113884, 0.23065363652113885)
    ratio_null = None
    tuple = (0.8575203124598336, 0.19557869693808477)
results_smaller = <class 'statsmodels.stats.base.HolderTuple'>
    statistic = -3.4807277010355238
    pvalue = 0.00025002679047994814
    distribution = 'normal'
    compare = 'diff'
    method = 'score'
    alternative = 'smaller'
    rates = (0.2, 0.225)
    ratio = 0.888888888888889
    diff = -0.024999999999999994
    value = 0.04
    rates_cmle = (0.24581855699051405, 0.20581855699051405)
    ratio_null = None
    tuple = (-3.4807277010355238, 0.00025002679047994814)
title = 'Equivalence test for 2 independent Poisson rates'
tuple = (0.8575203124598336, 0.19557869693808477)

The function nonequivalence_poisson_2indep tests the alternative hypothesis that the two rates differ by a non-neglibile amount.

In the following example, the alternative hypothesis specifies that the rate ratio is outside the interval (0.95, 1/0.95). The null hypothesis is that the ratio ratio is in the interval. If the test rejects the null hypothesis, then it provides evidence that the rate ratio differ by more than the unimportant amount specified by the interval limits.

A note on the relationship between point hypothesis test and interval hypothesis test in large samples. The point null hypothesis of test_poisson_2indep will reject any small deviation from the null hypothesis if the null hypothesis does not hold exactly and the sample size is large enough. The nonequivalence or minimum effect test will not reject the null hypothesis in large samples (sample approaches infinite) if rates differ by not more than the specified neglibible amount.

In the example neither the point nor the interval null hypothesis are rejected. We do not have enough evidence to say that the rates are statistically different. Following that, we increase the sample size 20 times while keeping observed rates constant. In this case, the point null hypothesis test is rejected, the pvalue is 0.01, while the interval null hypothesis is not rejected, the pvalue is equal to 1.

Note: The nonequivalence test is in general conservative, its size is bounded by alpha, but in the large sample limit with fixed nonequivalence margins the size approaches alpha / 2. If the nonequivalence interval shrinks to a single point, then the nonequivalence test is the same as the point hypothesis test. (see docstring)

[22]:
count1, n1, count2, n2 = 200, 1000, 420, 2000
low = 0.95
upp = 1 / low
nf = 1
nonequivalence_poisson_2indep(count1 * nf, n1 * nf, count2 * nf, n2 * nf, low, upp, method='score', compare='ratio')
[22]:
<class 'statsmodels.stats.base.HolderTuple'>
statistic = -1.1654330934961301
pvalue = 1.0232437381644721
method = 'score'
results_larger = <class 'statsmodels.stats.base.HolderTuple'>
    statistic = 0.02913582733740325
    pvalue = 0.5116218690822361
    distribution = 'normal'
    compare = 'ratio'
    method = 'score'
    alternative = 'smaller'
    rates = (0.2, 0.21)
    ratio = 0.9523809523809524
    diff = -0.009999999999999981
    value = 0.95
    rates_cmle = None
    ratio_null = 0.95
    tuple = (0.02913582733740325, 0.5116218690822361)
results_smaller = <class 'statsmodels.stats.base.HolderTuple'>
    statistic = -1.1654330934961301
    pvalue = 0.8780781359377093
    distribution = 'normal'
    compare = 'ratio'
    method = 'score'
    alternative = 'larger'
    rates = (0.2, 0.21)
    ratio = 0.9523809523809524
    diff = -0.009999999999999981
    value = 1.0526315789473684
    rates_cmle = None
    ratio_null = 1.0526315789473684
    tuple = (-1.1654330934961301, 0.8780781359377093)
title = 'Equivalence test for 2 independent Poisson rates'
tuple = (-1.1654330934961301, 1.0232437381644721)
[23]:
test_poisson_2indep(count1 * nf, n1 * nf, count2 * nf, n2 * nf, method='score', compare='ratio')
[23]:
<class 'statsmodels.stats.base.HolderTuple'>
statistic = -0.5679618342470648
pvalue = 0.5700608835629815
distribution = 'normal'
compare = 'ratio'
method = 'score'
alternative = 'two-sided'
rates = (0.2, 0.21)
ratio = 0.9523809523809524
diff = -0.009999999999999981
value = 1
rates_cmle = None
ratio_null = 1
tuple = (-0.5679618342470648, 0.5700608835629815)
[24]:
nf = 20
nonequivalence_poisson_2indep(count1 * nf, n1 * nf, count2 * nf, n2 * nf, low, upp, method='score', compare='ratio').pvalue
[24]:
1.1036704302254083
[25]:
test_poisson_2indep(count1 * nf, n1 * nf, count2 * nf, n2 * nf, method='score', compare='ratio').pvalue
[25]:
0.01108516638060269

Power

Statsmodels has limited support for computing statistical power for the comparison of 2 sample Poisson and Negative Binomial rates. Those are based on Zhu and Lakkis and Zhu for ratio comparisons for both distributions, and basic normal based comparison for the Poisson rate difference. Other methods that correspond more closely to the available methods in the hypothesis test function, especially Gu, are not yet available.

The available functions are

[26]:
power_poisson_ratio_2indep
power_equivalence_poisson_2indep
power_negbin_ratio_2indep
power_equivalence_neginb_2indep

power_poisson_diff_2indep
[26]:
<function statsmodels.stats.rates.power_poisson_diff_2indep(rate1, rate2, nobs1, nobs_ratio=1, alpha=0.05, value=0, method_var='score', alternative='two-sided', return_results=True)>