statsmodels.stats.nonparametric.samplesize_rank_compare_onetail

statsmodels.stats.nonparametric.samplesize_rank_compare_onetail(synthetic_sample, reference_sample, alpha, power, nobs_ratio=1, alternative='two-sided')[source]

Compute sample size for the non-parametric Mann-Whitney U test.

This function implements the method of Happ et al (2019).

Parameters:
synthetic_samplearray_like

Generated synthetic data representing the treatment group under the research hypothesis.

reference_samplearray_like

Advance information for the reference group.

alphafloat

The type I error rate for the test (two-sided).

powerfloat

The desired power of the test.

nobs_ratiofloat, optional

Sample size ratio, nobs_ref = nobs_ratio * nobs_treat. This is the ratio of the reference group sample size to the treatment group sample size, by default 1 (balanced design). See Notes.

alternativestr,`two-sided` (default),`larger`, or``smaller

Extra argument to choose whether the sample size is calculated for a two-sided (default) or one-sided test. See Notes.

Returns:
resHolder

An instance of Holder containing the following attributes:

nobs_totalfloat

The total sample size required for the experiment.

nobs_treatfloat

Sample size for the treatment group.

nobs_reffloat

Sample size for the reference group.

relative_effectfloat

The estimated relative effect size.

powerfloat

The desired power for the test.

alphafloat

The type I error rate for the test.

Notes

In the context of the two-sample Wilcoxon Mann-Whitney U test, the reference_sample typically represents data from the control group or previous studies. The synthetic_sample is generated based on this reference data and a prespecified relative effect size that is meaningful for the research question. This effect size is often determined in collaboration with subject matter experts to reflect a significant difference worth detecting. By comparing the reference and synthetic samples, this function estimates the sample size needed to acheve the desired power at the specified Type-I error rate.

Choosing between one-sided and two-sided tests has important implications for sample size planning. A two-sided test is more conservative and requires a larger sample size but covers effects in both directions. In contrast, a larger (relative_effect > 0.5) or smaller (relative_effect < 0.5) one-sided test assumes the effect occurs only in one direction, leading to a smaller required sample size. However, if the true effect is in the opposite direction, the one-sided test have virtually no power to detect it. Additionally, if a two-sided test ends up being used instead of the planned one-sided test, the original sample size may be insufficient, resulting in an underpowered study. It is important to carefully consider these trade-offs when planning a study.

For nobs_ratio > 1, nobs_ratio = 1, or nobs_ratio < 1, the reference group sample size is larger, equal to, or smaller than the treatment group sample size, respectively.

References

[1]

Happ, M., Bathke, A. C., and Brunner, E. “Optimal sample size planning for the Wilcoxon-Mann-Whitney test”. Statistics in Medicine. Vol. 38(2019): 363-375. https://doi.org/10.1002/sim.7983.

[2]

Thall, P. F., and Vail, S. C. “Some covariance models for longitudinal count data with overdispersion”. Biometrics, pp. 657-671, 1990.

Examples

The data for the placebo group of a clinical trial published in Thall and Vail [2] is shown below. A relevant effect for the treatment under investigation is considered to be a 50% reduction in the number of seizures. To compute the required sample size with a power of 0.8 and holding the type I error rate at 0.05, we generate synthetic data for the treatment group under the alternative assuming this reduction.

>>> from statsmodels.stats.nonparametric import samplesize_rank_compare_onetail
>>> import numpy as np
>>> reference_sample = np.array([3, 3, 5, 4, 21, 7, 2, 12, 5, 0, 22, 4, 2, 12,
...                              9, 5, 3, 29, 5, 7, 4, 4, 5, 8, 25, 1, 2, 12])
>>> # Apply 50% reduction in seizure counts and floor operation
>>> synthetic_sample = np.floor(reference_sample / 2)
>>> result = samplesize_rank_compare_onetail(
...              synthetic_sample=synthetic_sample,
...              reference_sample=reference_sample,
...              alpha=0.05, power=0.8
...          )
>>> print(f"Total sample size: {result.nobs_total}, "
...       f"Treatment group: {result.nobs_treat}, "
...       f"Reference group: {result.nobs_ref}")