statsmodels.stats.multitest.multipletests¶
- statsmodels.stats.multitest.multipletests(pvals, alpha=0.05, method='hs', maxiter=1, is_sorted=False, returnsorted=False)[source]¶
Test results and p-value correction for multiple tests
- Parameters:
- pvalsarray_like, 1-d
uncorrected p-values. Must be 1-dimensional.
- alpha
float
FWER, family-wise error rate, e.g. 0.1
- method
str
Method used for testing and adjustment of pvalues. Can be either the full name or initial letters. Available methods are:
bonferroni : one-step correction
sidak : one-step correction
holm-sidak : step down method using Sidak adjustments
holm : step-down method using Bonferroni adjustments
simes-hochberg : step-up method (independent)
hommel : closed method based on Simes tests (non-negative)
fdr_bh : Benjamini/Hochberg (non-negative)
fdr_by : Benjamini/Yekutieli (negative)
fdr_tsbh : two stage fdr correction (non-negative)
fdr_tsbky : two stage fdr correction (non-negative)
- maxiter
int
or bool Maximum number of iterations for two-stage fdr, fdr_tsbh and fdr_tsbky. It is ignored by all other methods. maxiter=1 (default) corresponds to the two stage method. maxiter=-1 corresponds to full iterations which is maxiter=len(pvals). maxiter=0 uses only a single stage fdr correction using a ‘bh’ or ‘bky’ prior fraction of assumed true hypotheses.
- is_sortedbool
If False (default), the p_values will be sorted, but the corrected pvalues are in the original order. If True, then it assumed that the pvalues are already sorted in ascending order.
- returnsortedbool
not tested, return sorted p-values instead of original sequence
- Returns:
Notes
There may be API changes for this function in the future.
Except for ‘fdr_twostage’, the p-value correction is independent of the alpha specified as argument. In these cases the corrected p-values can also be compared with a different alpha. In the case of ‘fdr_twostage’, the corrected p-values are specific to the given alpha, see
fdrcorrection_twostage
.The ‘fdr_gbs’ procedure is not verified against another package, p-values are derived from scratch and are not derived in the reference. In Monte Carlo experiments the method worked correctly and maintained the false discovery rate.
All procedures that are included, control FWER or FDR in the independent case, and most are robust in the positively correlated case.
fdr_gbs: high power, fdr control for independent case and only small violation in positively correlated case
Timing:
Most of the time with large arrays is spent in argsort. When we want to calculate the p-value for several methods, then it is more efficient to presort the pvalues, and put the results back into the original order outside of the function.
Method=’hommel’ is very slow for large arrays, since it requires the evaluation of n partitions, where n is the number of p-values.