{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Regression diagnostics" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This example file shows how to use a few of the ``statsmodels`` regression diagnostic tests in a real-life context. You can learn about more tests and find out more information about the tests here on the [Regression Diagnostics page.](https://www.statsmodels.org/stable/diagnostic.html)\n", "\n", "Note that most of the tests described here only return a tuple of numbers, without any annotation. A full description of outputs is always included in the docstring and in the online ``statsmodels`` documentation. For presentation purposes, we use the ``zip(name,test)`` construct to pretty-print short descriptions in the examples below." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Estimate a regression model" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "execution": { "iopub.execute_input": "2021-02-02T06:54:34.726188Z", "iopub.status.busy": "2021-02-02T06:54:34.725507Z", "iopub.status.idle": "2021-02-02T06:54:34.959596Z", "shell.execute_reply": "2021-02-02T06:54:34.960084Z" } }, "outputs": [], "source": [ "%matplotlib inline" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "execution": { "iopub.execute_input": "2021-02-02T06:54:34.965311Z", "iopub.status.busy": "2021-02-02T06:54:34.964696Z", "iopub.status.idle": "2021-02-02T06:54:35.702164Z", "shell.execute_reply": "2021-02-02T06:54:35.702573Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " OLS Regression Results \n", "==============================================================================\n", "Dep. Variable: Lottery R-squared: 0.348\n", "Model: OLS Adj. R-squared: 0.333\n", "Method: Least Squares F-statistic: 22.20\n", "Date: Tue, 02 Feb 2021 Prob (F-statistic): 1.90e-08\n", "Time: 06:54:35 Log-Likelihood: -379.82\n", "No. Observations: 86 AIC: 765.6\n", "Df Residuals: 83 BIC: 773.0\n", "Df Model: 2 \n", "Covariance Type: nonrobust \n", "===================================================================================\n", " coef std err t P>|t| [0.025 0.975]\n", "-----------------------------------------------------------------------------------\n", "Intercept 246.4341 35.233 6.995 0.000 176.358 316.510\n", "Literacy -0.4889 0.128 -3.832 0.000 -0.743 -0.235\n", "np.log(Pop1831) -31.3114 5.977 -5.239 0.000 -43.199 -19.424\n", "==============================================================================\n", "Omnibus: 3.713 Durbin-Watson: 2.019\n", "Prob(Omnibus): 0.156 Jarque-Bera (JB): 3.394\n", "Skew: -0.487 Prob(JB): 0.183\n", "Kurtosis: 3.003 Cond. No. 702.\n", "==============================================================================\n", "\n", "Notes:\n", "[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.\n" ] } ], "source": [ "from statsmodels.compat import lzip\n", "\n", "import numpy as np\n", "import pandas as pd\n", "import statsmodels.formula.api as smf\n", "import statsmodels.stats.api as sms\n", "import matplotlib.pyplot as plt\n", "\n", "# Load data\n", "url = 'https://raw.githubusercontent.com/vincentarelbundock/Rdatasets/master/csv/HistData/Guerry.csv'\n", "dat = pd.read_csv(url)\n", "\n", "# Fit regression model (using the natural log of one of the regressors)\n", "results = smf.ols('Lottery ~ Literacy + np.log(Pop1831)', data=dat).fit()\n", "\n", "# Inspect the results\n", "print(results.summary())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Normality of the residuals" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Jarque-Bera test:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "execution": { "iopub.execute_input": "2021-02-02T06:54:35.709679Z", "iopub.status.busy": "2021-02-02T06:54:35.709271Z", "iopub.status.idle": "2021-02-02T06:54:35.713948Z", "shell.execute_reply": "2021-02-02T06:54:35.713573Z" } }, "outputs": [ { "data": { "text/plain": [ "[('Jarque-Bera', 3.3936080248431666),\n", " ('Chi^2 two-tail prob.', 0.1832683123166337),\n", " ('Skew', -0.48658034311223375),\n", " ('Kurtosis', 3.003417757881633)]" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "name = ['Jarque-Bera', 'Chi^2 two-tail prob.', 'Skew', 'Kurtosis']\n", "test = sms.jarque_bera(results.resid)\n", "lzip(name, test)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Omni test:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "execution": { "iopub.execute_input": "2021-02-02T06:54:35.719914Z", "iopub.status.busy": "2021-02-02T06:54:35.719388Z", "iopub.status.idle": "2021-02-02T06:54:35.721894Z", "shell.execute_reply": "2021-02-02T06:54:35.722239Z" } }, "outputs": [ { "data": { "text/plain": [ "[('Chi^2', 3.713437811597181), ('Two-tail probability', 0.15618424580304824)]" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "name = ['Chi^2', 'Two-tail probability']\n", "test = sms.omni_normtest(results.resid)\n", "lzip(name, test)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Influence tests\n", "\n", "Once created, an object of class ``OLSInfluence`` holds attributes and methods that allow users to assess the influence of each observation. For example, we can compute and extract the first few rows of DFbetas by:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "execution": { "iopub.execute_input": "2021-02-02T06:54:35.727115Z", "iopub.status.busy": "2021-02-02T06:54:35.725081Z", "iopub.status.idle": "2021-02-02T06:54:35.767952Z", "shell.execute_reply": "2021-02-02T06:54:35.767529Z" } }, "outputs": [ { "data": { "text/plain": [ "array([[-0.00301154, 0.00290872, 0.00118179],\n", " [-0.06425662, 0.04043093, 0.06281609],\n", " [ 0.01554894, -0.03556038, -0.00905336],\n", " [ 0.17899858, 0.04098207, -0.18062352],\n", " [ 0.29679073, 0.21249207, -0.3213655 ]])" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from statsmodels.stats.outliers_influence import OLSInfluence\n", "test_class = OLSInfluence(results)\n", "test_class.dfbetas[:5,:]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Explore other options by typing ``dir(influence_test)``\n", "\n", "Useful information on leverage can also be plotted:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "execution": { "iopub.execute_input": "2021-02-02T06:54:35.772192Z", "iopub.status.busy": "2021-02-02T06:54:35.771427Z", "iopub.status.idle": "2021-02-02T06:54:35.938389Z", "shell.execute_reply": "2021-02-02T06:54:35.938756Z" } }, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "from statsmodels.graphics.regressionplots import plot_leverage_resid2\n", "fig, ax = plt.subplots(figsize=(8,6))\n", "fig = plot_leverage_resid2(results, ax = ax)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Other plotting options can be found on the [Graphics page.](https://www.statsmodels.org/stable/graphics.html)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Multicollinearity\n", "\n", "Condition number:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "execution": { "iopub.execute_input": "2021-02-02T06:54:35.944346Z", "iopub.status.busy": "2021-02-02T06:54:35.943769Z", "iopub.status.idle": "2021-02-02T06:54:35.946519Z", "shell.execute_reply": "2021-02-02T06:54:35.946861Z" } }, "outputs": [ { "data": { "text/plain": [ "702.1792145490062" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.linalg.cond(results.model.exog)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Heteroskedasticity tests\n", "\n", "Breush-Pagan test:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "execution": { "iopub.execute_input": "2021-02-02T06:54:35.951851Z", "iopub.status.busy": "2021-02-02T06:54:35.951072Z", "iopub.status.idle": "2021-02-02T06:54:35.955846Z", "shell.execute_reply": "2021-02-02T06:54:35.956170Z" } }, "outputs": [ { "data": { "text/plain": [ "[('Lagrange multiplier statistic', 4.893213374093957),\n", " ('p-value', 0.08658690502352209),\n", " ('f-value', 2.503715946256434),\n", " ('f p-value', 0.08794028782673029)]" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "name = ['Lagrange multiplier statistic', 'p-value',\n", " 'f-value', 'f p-value']\n", "test = sms.het_breuschpagan(results.resid, results.model.exog)\n", "lzip(name, test)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Goldfeld-Quandt test" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "execution": { "iopub.execute_input": "2021-02-02T06:54:35.960963Z", "iopub.status.busy": "2021-02-02T06:54:35.960184Z", "iopub.status.idle": "2021-02-02T06:54:35.964915Z", "shell.execute_reply": "2021-02-02T06:54:35.965254Z" } }, "outputs": [ { "data": { "text/plain": [ "[('F statistic', 1.1002422436378152), ('p-value', 0.3820295068692507)]" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "name = ['F statistic', 'p-value']\n", "test = sms.het_goldfeldquandt(results.resid, results.model.exog)\n", "lzip(name, test)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Linearity\n", "\n", "Harvey-Collier multiplier test for Null hypothesis that the linear specification is correct:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "execution": { "iopub.execute_input": "2021-02-02T06:54:35.969519Z", "iopub.status.busy": "2021-02-02T06:54:35.968732Z", "iopub.status.idle": "2021-02-02T06:54:35.974569Z", "shell.execute_reply": "2021-02-02T06:54:35.974213Z" } }, "outputs": [ { "data": { "text/plain": [ "[('t value', -1.0796490077823977), ('p value', 0.28346392475408044)]" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "name = ['t value', 'p value']\n", "test = sms.linear_harvey_collier(results)\n", "lzip(name, test)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.9" } }, "nbformat": 4, "nbformat_minor": 1 }