Discrete Choice Models Overview ================================= .. _discrete_choice_overview_notebook: Link to Notebook GitHub _ .. raw:: html
In [ ]:
from __future__ import print_function
import numpy as np
import statsmodels.api as sm


## Data¶

Load data from Spector and Mazzeo (1980). Examples follow Greene's Econometric Analysis Ch. 21 (5th Edition).

In [ ]:
spector_data = sm.datasets.spector.load()


Inspect the data:

In [ ]:
print(spector_data.exog[:5,:])
print(spector_data.endog[:5])


## Linear Probability Model (OLS)¶

In [ ]:
lpm_mod = sm.OLS(spector_data.endog, spector_data.exog)
lpm_res = lpm_mod.fit()
print('Parameters: ', lpm_res.params[:-1])

[[  2.66  20.     0.     1.  ]
[  2.89  22.     0.     1.  ]
[  3.28  24.     0.     1.  ]
[  2.92  12.     0.     1.  ]
[  4.    21.     0.     1.  ]]
[ 0.  0.  0.  0.  1.]


## Logit Model¶

In [ ]:
logit_mod = sm.Logit(spector_data.endog, spector_data.exog)
logit_res = logit_mod.fit(disp=0)
print('Parameters: ', logit_res.params)

Parameters:  [ 0.46385168  0.01049512  0.37855479]


Marginal Effects

In [ ]:
margeff = logit_res.get_margeff()
print(margeff.summary())

Parameters:  [  2.82611259   0.09515766   2.37868766 -13.02134686]


As in all the discrete data models presented below, we can print a nice summary of results:

In [ ]:
print(logit_res.summary())

        Logit Marginal Effects
=====================================
Dep. Variable:                      y
Method:                          dydx
At:                           overall
==============================================================================
dy/dx    std err          z      P>|z|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
x1             0.3626      0.109      3.313      0.001         0.148     0.577
x2             0.0122      0.018      0.686      0.493        -0.023     0.047
x3             0.3052      0.092      3.304      0.001         0.124     0.486
==============================================================================


## Probit Model¶

In [ ]:
probit_mod = sm.Probit(spector_data.endog, spector_data.exog)
probit_res = probit_mod.fit()
probit_margeff = probit_res.get_margeff()
print('Parameters: ', probit_res.params)
print('Marginal effects: ')
print(probit_margeff.summary())

                           Logit Regression Results
==============================================================================
Dep. Variable:                      y   No. Observations:                   32
Model:                          Logit   Df Residuals:                       28
Method:                           MLE   Df Model:                            3
Date:                Mon, 20 Jul 2015   Pseudo R-squ.:                  0.3740
Time:                        17:43:21   Log-Likelihood:                -12.890
converged:                       True   LL-Null:                       -20.592
LLR p-value:                  0.001502
==============================================================================
coef    std err          z      P>|z|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
x1             2.8261      1.263      2.238      0.025         0.351     5.301
x2             0.0952      0.142      0.672      0.501        -0.182     0.373
x3             2.3787      1.065      2.234      0.025         0.292     4.465
const        -13.0213      4.931     -2.641      0.008       -22.687    -3.356
==============================================================================


## Multinomial Logit¶

Load data from the American National Election Studies:

In [ ]:
anes_data = sm.datasets.anes96.load()
anes_exog = anes_data.exog

Optimization terminated successfully.
Current function value: 0.400588
Iterations 6
Parameters:  [ 1.62581004  0.05172895  1.42633234 -7.45231965]
Marginal effects:
Probit Marginal Effects
=====================================
Dep. Variable:                      y
Method:                          dydx
At:                           overall
==============================================================================
dy/dx    std err          z      P>|z|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
x1             0.3608      0.113      3.182      0.001         0.139     0.583
x2             0.0115      0.018      0.624      0.533        -0.025     0.048
x3             0.3165      0.090      3.508      0.000         0.140     0.493
==============================================================================


Inspect the data:

In [ ]:
print(anes_data.exog[:5,:])
print(anes_data.endog[:5])


Fit MNL model:

In [ ]:
mlogit_mod = sm.MNLogit(anes_data.endog, anes_exog)
mlogit_res = mlogit_mod.fit()
print(mlogit_res.params)

[[ -2.30258509   7.          36.           3.           1.        ]
[  5.24755025   3.          20.           4.           1.        ]
[  3.43720782   2.          24.           6.           1.        ]
[  4.4200447    3.          28.           6.           1.        ]
[  6.46162441   5.          68.           6.           1.        ]]
[ 6.  1.  1.  1.  0.]


## Poisson¶

Load the Rand data. Note that this example is similar to Cameron and Trivedi's Microeconometrics Table 20.5, but it is slightly different because of minor changes in the data.

In [ ]:
rand_data = sm.datasets.randhie.load()
rand_exog = rand_data.exog.view(float).reshape(len(rand_data.exog), -1)

Optimization terminated successfully.
Current function value: 1.548647
Iterations 7
[[ -1.15359746e-02  -8.87506530e-02  -1.05966699e-01  -9.15567017e-02
-9.32846040e-02  -1.40880692e-01]
[  2.97714352e-01   3.91668642e-01   5.73450508e-01   1.27877179e+00
1.34696165e+00   2.07008014e+00]
[ -2.49449954e-02  -2.28978371e-02  -1.48512069e-02  -8.68134503e-03
-1.79040689e-02  -9.43264870e-03]
[  8.24914421e-02   1.81042758e-01  -7.15241904e-03   1.99827955e-01
2.16938850e-01   3.21925702e-01]
[  5.19655317e-03   4.78739761e-02   5.75751595e-02   8.44983753e-02
8.09584122e-02   1.08894083e-01]
[ -3.73401677e-01  -2.25091318e+00  -3.66558353e+00  -7.61384309e+00
-7.06047825e+00  -1.21057509e+01]]


Fit Poisson model:

In [ ]:
poisson_mod = sm.Poisson(rand_data.endog, rand_exog)
poisson_res = poisson_mod.fit(method="newton")
print(poisson_res.summary())


## Negative Binomial¶

The negative binomial model gives slightly different results.

In [ ]:
mod_nbin = sm.NegativeBinomial(rand_data.endog, rand_exog)
res_nbin = mod_nbin.fit(disp=False)
print(res_nbin.summary())

Optimization terminated successfully.
Current function value: 3.091609
Iterations 12
Poisson Regression Results
==============================================================================
Dep. Variable:                      y   No. Observations:                20190
Model:                        Poisson   Df Residuals:                    20180
Method:                           MLE   Df Model:                            9
Date:                Mon, 20 Jul 2015   Pseudo R-squ.:                 0.06343
Time:                        17:43:22   Log-Likelihood:                -62420.
converged:                       True   LL-Null:                       -66647.
LLR p-value:                     0.000
==============================================================================
coef    std err          z      P>|z|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
x1            -0.0525      0.003    -18.216      0.000        -0.058    -0.047
x2            -0.2471      0.011    -23.272      0.000        -0.268    -0.226
x3             0.0353      0.002     19.302      0.000         0.032     0.039
x4            -0.0346      0.002    -21.439      0.000        -0.038    -0.031
x5             0.2717      0.012     22.200      0.000         0.248     0.296
x6             0.0339      0.001     60.098      0.000         0.033     0.035
x7            -0.0126      0.009     -1.366      0.172        -0.031     0.005
x8             0.0541      0.015      3.531      0.000         0.024     0.084
x9             0.2061      0.026      7.843      0.000         0.155     0.258
const          0.7004      0.011     62.741      0.000         0.678     0.722
==============================================================================


## Alternative solvers¶

The default method for fitting discrete data MLE models is Newton-Raphson. You can use other solvers by using the method argument:

In [ ]:
mlogit_res = mlogit_mod.fit(method='bfgs', maxiter=100)
print(mlogit_res.summary())

                     NegativeBinomial Regression Results
==============================================================================
Dep. Variable:                      y   No. Observations:                20190
Model:               NegativeBinomial   Df Residuals:                    20180
Method:                           MLE   Df Model:                            9
Date:                Mon, 20 Jul 2015   Pseudo R-squ.:                 0.01845
Time:                        17:43:23   Log-Likelihood:                -43384.
converged:                      False   LL-Null:                       -44199.
LLR p-value:                     0.000
==============================================================================
coef    std err          z      P>|z|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
x1            -0.0580      0.006     -9.517      0.000        -0.070    -0.046
x2            -0.2678      0.023    -11.802      0.000        -0.312    -0.223
x3             0.0412      0.004      9.937      0.000         0.033     0.049
x4            -0.0381      0.003    -11.219      0.000        -0.045    -0.031
x5             0.2690      0.030      8.981      0.000         0.210     0.328
x6             0.0382      0.001     26.081      0.000         0.035     0.041
x7            -0.0441      0.020     -2.200      0.028        -0.083    -0.005
x8             0.0172      0.036      0.477      0.633        -0.054     0.088
x9             0.1780      0.074      2.397      0.017         0.032     0.324
const          0.6636      0.025     26.787      0.000         0.615     0.712
alpha          1.2930      0.019     69.477      0.000         1.256     1.329
==============================================================================

/Users/tom.augspurger/Envs/py3/lib/python3.4/site-packages/statsmodels-0.6.1-py3.4-macosx-10.10-x86_64.egg/statsmodels/base/model.py:466: ConvergenceWarning: Maximum Likelihood optimization failed to converge. Check mle_retvals
"Check mle_retvals", ConvergenceWarning)