statsmodels.stats.outliers_influence.OLSInfluence

class statsmodels.stats.outliers_influence.OLSInfluence(results)[source]

class to calculate outlier and influence measures for OLS result

Parameters
resultsRegressionResults

currently assumes the results are from an OLS regression

Notes

One part of the results can be calculated without any auxiliary regression (some of which have the _internal postfix in the name. Other statistics require leave-one-observation-out (LOOO) auxiliary regression, and will be slower (mainly results with _external postfix in the name). The auxiliary LOOO regression only the required results are stored.

Using the LOO measures is currently only recommended if the data set is not too large. One possible approach for LOOO measures would be to identify possible problem observations with the _internal measures, and then run the leave-one-observation-out only with observations that are possible outliers. (However, this is not yet available in an automated way.)

This should be extended to general least squares.

The leave-one-variable-out (LOVO) auxiliary regression are currently not used.

Attributes
cooks_distance

Cooks distance

Uses original results, no nobs loop

*

Eubank, R. L. (1999). Nonparametric regression and spline smoothing. CRC press.

Cook’s distance. (n.d.). In Wikipedia. July 2019, from https://en.wikipedia.org/wiki/Cook%27s_distance

cov_ratio

covariance ratio between LOOO and original

This uses determinant of the estimate of the parameter covariance from leave-one-out estimates. requires leave one out loop for observations

det_cov_params_not_obsi

determinant of cov_params of all LOOO regressions

dfbeta

dfbetas

uses results from leave-one-observation-out loop

dfbetas

dfbetas

uses results from leave-one-observation-out loop

dffits

dffits measure for influence of an observation

based on resid_studentized_external, uses results from leave-one-observation-out loop

It is recommended that observations with dffits large than a threshold of 2 sqrt{k / n} where k is the number of parameters, should be investigated.

dffits : float dffits_threshold : float

Wikipedia

dffits_internal

dffits measure for influence of an observation

based on resid_studentized_internal uses original results, no nobs loop

ess_press

Error sum of squares of PRESS residuals

hat_diag_factor

Factor of diagonal of hat_matrix used in influence

this might be useful for internal reuse h / (1 - h)

hat_matrix_diag

Diagonal of the hat_matrix for OLS

temporarily calculated here, this should go to model class

influence

Influence measure

matches the influence measure that gretl reports u * h / (1 - h) where u are the residuals and h is the diagonal of the hat_matrix

params_not_obsi

parameter estimates for all LOOO regressions

resid_press

PRESS residuals

resid_std

estimate of standard deviation of the residuals

resid_var

resid_studentized

Studentized residuals using variance from OLS

alias for resid_studentized_internal for compatibility with MLEInfluence this uses sigma from original estimate and does not require leave one out loop

resid_studentized_external

Studentized residuals using LOOO variance

this uses sigma from leave-one-out estimates

requires leave one out loop for observations

resid_studentized_internal

Studentized residuals using variance from OLS

this uses sigma from original estimate does not require leave one out loop

resid_var

estimate of variance of the residuals

sigma2 = sigma2_OLS * (1 - hii)

where hii is the diagonal of the hat matrix

sigma2_not_obsi

error variance for all LOOO regressions

This is ‘mse_resid’ from each auxiliary regression.

uses results from leave-one-observation-out loop

Methods

get_resid_studentized_external([sigma])

calculate studentized residuals

plot_index([y_var, threshold, title, ax, idx])

index plot for influence attributes

plot_influence([external, alpha, criterion, ...])

Plot of influence in regression.

summary_frame()

Creates a DataFrame with all available influence results.

summary_table([float_fmt])

create a summary table with all influence and outlier measures

Properties

cooks_distance

Cooks distance

cov_ratio

covariance ratio between LOOO and original

det_cov_params_not_obsi

determinant of cov_params of all LOOO regressions

dfbeta

dfbetas

dfbetas

uses results from leave-one-observation-out loop

dffits

dffits measure for influence of an observation

dffits_internal

dffits measure for influence of an observation

ess_press

Error sum of squares of PRESS residuals

hat_diag_factor

Factor of diagonal of hat_matrix used in influence

hat_matrix_diag

Diagonal of the hat_matrix for OLS

influence

Influence measure

params_not_obsi

parameter estimates for all LOOO regressions

resid_press

PRESS residuals

resid_std

estimate of standard deviation of the residuals

resid_studentized

Studentized residuals using variance from OLS

resid_studentized_external

Studentized residuals using LOOO variance

resid_studentized_internal

Studentized residuals using variance from OLS

resid_var

estimate of variance of the residuals

sigma2_not_obsi

error variance for all LOOO regressions