statsmodels.stats.dist_dependence_measures.distance_statistics

statsmodels.stats.dist_dependence_measures.distance_statistics(x, y, x_dist=None, y_dist=None)[source]

Calculate various distance dependence statistics.

Calculate several distance dependence statistics as described in [1].

Parameters:
xarray_like, 1-D or 2-D

If x is 1-D than it is assumed to be a vector of observations of a single random variable. If x is 2-D than the rows should be observations and the columns are treated as the components of a random vector, i.e., each column represents a different component of the random vector x.

yarray_like, 1-D or 2-D

Same as x, but only the number of observation has to match that of x. If y is 2-D note that the number of columns of y (i.e., the number of components in the random vector) does not need to match the number of columns in x.

x_distarray_like, 2-D, optional

A square 2-D array_like object whose values are the euclidean distances between x’s rows.

y_distarray_like, 2-D, optional

A square 2-D array_like object whose values are the euclidean distances between y’s rows.

Returns:
collections.namedtuple

A named tuple of distance dependence statistics (DistDependStat) with the following values:

  • test_statistic : float - The “basic” test statistic (i.e., the one used when the emp method is chosen when calling distance_covariance_test()

  • distance_correlation : float - The distance correlation between x and y.

  • distance_covariance : float - The distance covariance of x and y.

  • dvar_x : float - The distance variance of x.

  • dvar_y : float - The distance variance of y.

  • S : float - The mean of the euclidean distances in x multiplied by those of y. Mostly used internally.

References

[1]

Szekely, G.J., Rizzo, M.L., and Bakirov, N.K. (2007) “Measuring and testing dependence by correlation of distances”. Annals of Statistics, Vol. 35 No. 6, pp. 2769-2794.

Examples

>>> from statsmodels.stats.dist_dependence_measures import
... distance_statistics
>>> distance_statistics(np.random.random(1000), np.random.random(1000))
DistDependStat(test_statistic=0.07948284320205831,
distance_correlation=0.04269511890990793,
distance_covariance=0.008915315092696293,
dvar_x=0.20719027438266704, dvar_y=0.21044934264957588,
S=0.10892061635588891)