statsmodels.stats.dist_dependence_measures.distance_covariance_test

statsmodels.stats.dist_dependence_measures.distance_covariance_test(x, y, B=None, method='auto')[source]

The Distance Covariance (dCov) test

Apply the Distance Covariance (dCov) test of independence to x and y. This test was introduced in [1], and is based on the distance covariance statistic. The test is applicable to random vectors of arbitrary length (see the notes section for more details).

Parameters
xarray_like, 1-D or 2-D

If x is 1-D than it is assumed to be a vector of observations of a single random variable. If x is 2-D than the rows should be observations and the columns are treated as the components of a random vector, i.e., each column represents a different component of the random vector x.

yarray_like, 1-D or 2-D

Same as x, but only the number of observation has to match that of x. If y is 2-D note that the number of columns of y (i.e., the number of components in the random vector) does not need to match the number of columns in x.

Bint, optional, default=`None`

The number of iterations to perform when evaluating the null distribution of the test statistic when the emp method is applied (see below). if B is None than as in [1] we set B to be B = 200 + 5000/n, where n is the number of observations.

method{‘auto’, ‘emp’, ‘asym’}, optional, default=auto

The method by which to obtain the p-value for the test.

  • auto : Default method. The number of observations will be used to determine the method.

  • emp : Empirical evaluation of the p-value using permutations of the rows of y to obtain the null distribution.

  • asym : An asymptotic approximation of the distribution of the test statistic is used to find the p-value.

Returns
test_statisticfloat

The value of the test statistic used in the test.

pvalfloat

The p-value.

chosen_methodstr

The method that was used to obtain the p-value. Mostly relevant when the function is called with method=’auto’.

Notes

The test applies to random vectors of arbitrary dimensions, i.e., x can be a 1-D vector of observations for a single random variable while y can be a k by n 2-D array (where k > 1). In other words, it is also possible for x and y to both be 2-D arrays and have the same number of rows (observations) while differing in the number of columns.

As noted in [1] the statistics are sensitive to all types of departures from independence, including nonlinear or nonmonotone dependence structure.

References

1(1,2,3)

Szekely, G.J., Rizzo, M.L., and Bakirov, N.K. (2007) “Measuring and testing by correlation of distances”. Annals of Statistics, Vol. 35 No. 6, pp. 2769-2794.

Examples

>>> from statsmodels.stats.dist_dependence_measures import
... distance_covariance_test
>>> data = np.random.rand(1000, 10)
>>> x, y = data[:, :3], data[:, 3:]
>>> x.shape
(1000, 3)
>>> y.shape
(1000, 7)
>>> distance_covariance_test(x, y)
(1.0426404792714983, 0.2971148340813543, 'asym')
# (test_statistic, pval, chosen_method)