statsmodels.sandbox.stats.runs.runstest_2samp#

statsmodels.sandbox.stats.runs.runstest_2samp(x, y=None, groups=None, correction=True)[source]#

Wald-Wolfowitz runstest for two samples

This tests whether two samples come from the same distribution.

Parameters:

xarray_like: data, numeric, contains either one group, if y is also given, or both groups, if additionally a group indicator is provided
yarray_like (optional): data, numeric
groupsarray_like: group labels or indicator the data for both groups is given in a single 1-dimensional array, x. If group labels are not [0,1], then
correctionbool: Following the SAS manual, for samplesize below 50, the test statistic is corrected by 0.5. This can be turned off with correction=False, and was included to match R, tseries, which does not use any correction.

Returns:

z_statfloat: test statistic, asymptotically normally distributed
p-valuefloat: p-value, reject the null hypothesis if it is below an type 1 error level, alpha .

See also

runs_test_1samp
Runs
RunsProb

Notes

Wald-Wolfowitz runs test.

If there are ties, then the test statistic and p-value that is reported, is based on the higher p-value between sorting all tied observations of the same group

This test is intended for continuous distributions SAS has treatment for ties, but not clear, and sounds more complicated (minimum and maximum possible runs prevent use of argsort) (maybe it’s not so difficult, idea: add small positive noise to first one, run test, then to the other, run test, take max(?) p-value - DONE This gives not the minimum and maximum of the number of runs, but should be close. Not true, this is close to minimum but far away from maximum. maximum number of runs would use alternating groups in the ties.) Maybe adding random noise would be the better approach.

SAS has exact distribution for sample size <=30, does not look standard but should be easy to add.

currently two-sided test only

This has not been verified against a reference implementation. In a short Monte Carlo simulation where both samples are normally distribute, the test seems to be correctly sized for larger number of observations (30 or larger), but conservative (i.e. reject less often than nominal) with a sample size of 10 in each group.