statsmodels.stats.nonparametric.samplesize_rank_compare_onetail#

statsmodels.stats.nonparametric.samplesize_rank_compare_onetail(synthetic_sample, reference_sample, alpha, power, nobs_ratio=1, alternative='two-sided')[source]#

Compute sample size for the non-parametric Mann-Whitney U test

This function implements the method of Happ et al (2019).

Parameters:

synthetic_samplearray_like: Generated synthetic data representing the treatment group under the research hypothesis.
reference_samplearray_like: Available information for the reference group.
alphafloat: The type I error rate for the test (two-sided).
powerfloat: The desired power of the test.
nobs_ratiofloat, optional: Sample size ratio, nobs_ref = nobs_ratio * nobs_treat. This is the ratio of the reference group sample size to the treatment group sample size, by default 1 (balanced design). See Notes.
alternative{“two-sided”, “larger”, “smaller”}, optional: Extra argument to choose whether the sample size is calculated for a two-sided (default) or one-sided test. See Notes.

Returns:

resHolder

An instance of Holder containing the following attributes:

nobs_totalfloat: The total sample size required for the experiment.
nobs_treatfloat: Sample size for the treatment group.
nobs_reffloat: Sample size for the reference group.
relative_effectfloat: The estimated relative effect size.
powerfloat: The desired power for the test.
alphafloat: The type I error rate for the test.

Notes

In the context of the two-sample Wilcoxon Mann-Whitney U test, the reference_sample typically represents data from the control group or previous studies. The synthetic_sample is generated based on this reference data and a prespecified relative effect size that is meaningful for the research question. This effect size is often determined in collaboration with subject matter experts to reflect a significant difference worth detecting. By comparing the reference and synthetic samples, this function estimates the sample size needed to achieve the desired power at the specified Type-I error rate.

Choosing between one-sided and two-sided tests has important implications for sample size planning. A two-sided test is more conservative and requires a larger sample size but covers effects in both directions. In contrast, a larger (relative_effect > 0.5) or smaller (relative_effect < 0.5) one-sided test assumes the effect occurs only in one direction, leading to a smaller required sample size. However, if the true effect is in the opposite direction, the one-sided test has virtually no power to detect it. Additionally, if a two-sided test ends up being used instead of the planned one-sided test, the original sample size may be insufficient, resulting in an underpowered study. It is important to carefully consider these trade-offs when planning a study.

For nobs_ratio > 1, nobs_ratio = 1, or nobs_ratio < 1, the reference group sample size is larger, equal to, or smaller than the treatment group sample size, respectively.

References

[1]

Happ, M., Bathke, A. C., and Brunner, E. “Optimal sample size planning for the Wilcoxon-Mann-Whitney test”. Statistics in Medicine. Vol. 38(2019): 363-375. https://doi.org/10.1002/sim.7983.

[2]

Thall, P. F., and Vail, S. C. “Some covariance models for longitudinal count data with overdispersion”. Biometrics, pp. 657-671, 1990.

Examples

The data for the placebo group of a clinical trial published in Thall and Vail [2] is shown below. A relevant effect for the treatment under investigation is considered to be a 50% reduction in the number of seizures. To compute the required sample size with a power of 0.8 and holding the type I error rate at 0.05, we generate synthetic data for the treatment group under the alternative assuming this reduction.

>>> from statsmodels.stats.nonparametric import samplesize_rank_compare_onetail
>>> import numpy as np
>>> reference_sample = np.array([3, 3, 5, 4, 21, 7, 2, 12, 5, 0, 22, 4, 2, 12,
...                              9, 5, 3, 29, 5, 7, 4, 4, 5, 8, 25, 1, 2, 12])
>>> # Apply 50% reduction in seizure counts and floor operation
>>> synthetic_sample = np.floor(reference_sample / 2)
>>> result = samplesize_rank_compare_onetail(
...              synthetic_sample=synthetic_sample,
...              reference_sample=reference_sample,
...              alpha=0.05, power=0.8
...          )
>>> print(f"Total sample size: {result.nobs_total}, "
...       f"Treatment group: {result.nobs_treat}, "
...       f"Reference group: {result.nobs_ref}")