# statsmodels.stats.correlation_tools.corr_thresholded¶

statsmodels.stats.correlation_tools.corr_thresholded(data, minabs=`None`, max_elt=`10000000.0`)[source]

Construct a sparse matrix containing the thresholded row-wise correlation matrix from a data array.

Parameters:
dataarray_like

The data from which the row-wise thresholded correlation matrix is to be computed.

minabsnon-negative `real`

The threshold value; correlation coefficients smaller in magnitude than minabs are set to zero. If None, defaults to 1 / sqrt(n), see Notes for more information.

Returns:
cormat`sparse.coo_matrix`

The thresholded correlation matrix, in COO format.

Notes

This is an alternative to C = np.corrcoef(data); C *= (np.abs(C) >= absmin), suitable for very tall data matrices.

If the data are jointly Gaussian, the marginal sampling distributions of the elements of the sample correlation matrix are approximately Gaussian with standard deviation 1 / sqrt(n). The default value of `minabs` is thus equal to 1 standard error, which will set to zero approximately 68% of the estimated correlation coefficients for which the population value is zero.

No intermediate matrix with more than `max_elt` values will be constructed. However memory use could still be high if a large number of correlation values exceed minabs in magnitude.

The thresholded matrix is returned in COO format, which can easily be converted to other sparse formats.

Examples

Here X is a tall data matrix (e.g. with 100,000 rows and 50 columns). The row-wise correlation matrix of X is calculated and stored in sparse form, with all entries smaller than 0.3 treated as 0.

``````>>> import numpy as np
>>> np.random.seed(1234)
>>> b = 1.5 - np.random.rand(10, 1)
>>> x = np.random.randn(100,1).dot(b.T) + np.random.randn(100,10)
>>> cmat = corr_thresholded(x, 0.3)
``````

Last update: May 05, 2023