Discrete Choice Models Overview ================================= .. _discrete_choice_overview_notebook: `Link to Notebook GitHub `_ .. raw:: html
In [ ]:
from __future__ import print_function
   import numpy as np
   import statsmodels.api as sm
   

Data

Load data from Spector and Mazzeo (1980). Examples follow Greene's Econometric Analysis Ch. 21 (5th Edition).

In [ ]:
spector_data = sm.datasets.spector.load()
   spector_data.exog = sm.add_constant(spector_data.exog, prepend=False)
   

Inspect the data:

In [ ]:
print(spector_data.exog[:5,:])
   print(spector_data.endog[:5])
   

Linear Probability Model (OLS)

In [ ]:
lpm_mod = sm.OLS(spector_data.endog, spector_data.exog)
   lpm_res = lpm_mod.fit()
   print('Parameters: ', lpm_res.params[:-1])
   
[[  2.66  20.     0.     1.  ]
    [  2.89  22.     0.     1.  ]
    [  3.28  24.     0.     1.  ]
    [  2.92  12.     0.     1.  ]
    [  4.    21.     0.     1.  ]]
   [ 0.  0.  0.  0.  1.]
   

Logit Model

In [ ]:
logit_mod = sm.Logit(spector_data.endog, spector_data.exog)
   logit_res = logit_mod.fit(disp=0)
   print('Parameters: ', logit_res.params)
   
Parameters:  [ 0.46385168  0.01049512  0.37855479]
   

Marginal Effects

In [ ]:
margeff = logit_res.get_margeff()
   print(margeff.summary())
   
Parameters:  [  2.82611259   0.09515766   2.37868766 -13.02134686]
   

As in all the discrete data models presented below, we can print a nice summary of results:

In [ ]:
print(logit_res.summary())
   
        Logit Marginal Effects       
   =====================================
   Dep. Variable:                      y
   Method:                          dydx
   At:                           overall
   ==============================================================================
                   dy/dx    std err          z      P>|z|      [95.0% Conf. Int.]
   ------------------------------------------------------------------------------
   x1             0.3626      0.109      3.313      0.001         0.148     0.577
   x2             0.0122      0.018      0.686      0.493        -0.023     0.047
   x3             0.3052      0.092      3.304      0.001         0.124     0.486
   ==============================================================================
   

Probit Model

In [ ]:
probit_mod = sm.Probit(spector_data.endog, spector_data.exog)
   probit_res = probit_mod.fit()
   probit_margeff = probit_res.get_margeff()
   print('Parameters: ', probit_res.params)
   print('Marginal effects: ')
   print(probit_margeff.summary())
   
                           Logit Regression Results                           
   ==============================================================================
   Dep. Variable:                      y   No. Observations:                   32
   Model:                          Logit   Df Residuals:                       28
   Method:                           MLE   Df Model:                            3
   Date:                Mon, 20 Jul 2015   Pseudo R-squ.:                  0.3740
   Time:                        17:43:21   Log-Likelihood:                -12.890
   converged:                       True   LL-Null:                       -20.592
                                           LLR p-value:                  0.001502
   ==============================================================================
                    coef    std err          z      P>|z|      [95.0% Conf. Int.]
   ------------------------------------------------------------------------------
   x1             2.8261      1.263      2.238      0.025         0.351     5.301
   x2             0.0952      0.142      0.672      0.501        -0.182     0.373
   x3             2.3787      1.065      2.234      0.025         0.292     4.465
   const        -13.0213      4.931     -2.641      0.008       -22.687    -3.356
   ==============================================================================
   

Multinomial Logit

Load data from the American National Election Studies:

In [ ]:
anes_data = sm.datasets.anes96.load()
   anes_exog = anes_data.exog
   anes_exog = sm.add_constant(anes_exog, prepend=False)
   
Optimization terminated successfully.
            Current function value: 0.400588
            Iterations 6
   Parameters:  [ 1.62581004  0.05172895  1.42633234 -7.45231965]
   Marginal effects: 
          Probit Marginal Effects       
   =====================================
   Dep. Variable:                      y
   Method:                          dydx
   At:                           overall
   ==============================================================================
                   dy/dx    std err          z      P>|z|      [95.0% Conf. Int.]
   ------------------------------------------------------------------------------
   x1             0.3608      0.113      3.182      0.001         0.139     0.583
   x2             0.0115      0.018      0.624      0.533        -0.025     0.048
   x3             0.3165      0.090      3.508      0.000         0.140     0.493
   ==============================================================================
   

Inspect the data:

In [ ]:
print(anes_data.exog[:5,:])
   print(anes_data.endog[:5])
   

Fit MNL model:

In [ ]:
mlogit_mod = sm.MNLogit(anes_data.endog, anes_exog)
   mlogit_res = mlogit_mod.fit()
   print(mlogit_res.params)
   
[[ -2.30258509   7.          36.           3.           1.        ]
    [  5.24755025   3.          20.           4.           1.        ]
    [  3.43720782   2.          24.           6.           1.        ]
    [  4.4200447    3.          28.           6.           1.        ]
    [  6.46162441   5.          68.           6.           1.        ]]
   [ 6.  1.  1.  1.  0.]
   

Poisson

Load the Rand data. Note that this example is similar to Cameron and Trivedi's Microeconometrics Table 20.5, but it is slightly different because of minor changes in the data.

In [ ]:
rand_data = sm.datasets.randhie.load()
   rand_exog = rand_data.exog.view(float).reshape(len(rand_data.exog), -1)
   rand_exog = sm.add_constant(rand_exog, prepend=False)
   
Optimization terminated successfully.
            Current function value: 1.548647
            Iterations 7
   [[ -1.15359746e-02  -8.87506530e-02  -1.05966699e-01  -9.15567017e-02
      -9.32846040e-02  -1.40880692e-01]
    [  2.97714352e-01   3.91668642e-01   5.73450508e-01   1.27877179e+00
       1.34696165e+00   2.07008014e+00]
    [ -2.49449954e-02  -2.28978371e-02  -1.48512069e-02  -8.68134503e-03
      -1.79040689e-02  -9.43264870e-03]
    [  8.24914421e-02   1.81042758e-01  -7.15241904e-03   1.99827955e-01
       2.16938850e-01   3.21925702e-01]
    [  5.19655317e-03   4.78739761e-02   5.75751595e-02   8.44983753e-02
       8.09584122e-02   1.08894083e-01]
    [ -3.73401677e-01  -2.25091318e+00  -3.66558353e+00  -7.61384309e+00
      -7.06047825e+00  -1.21057509e+01]]
   

Fit Poisson model:

In [ ]:
poisson_mod = sm.Poisson(rand_data.endog, rand_exog)
   poisson_res = poisson_mod.fit(method="newton")
   print(poisson_res.summary())
   

Negative Binomial

The negative binomial model gives slightly different results.

In [ ]:
mod_nbin = sm.NegativeBinomial(rand_data.endog, rand_exog)
   res_nbin = mod_nbin.fit(disp=False)
   print(res_nbin.summary())
   
Optimization terminated successfully.
            Current function value: 3.091609
            Iterations 12
                             Poisson Regression Results                          
   ==============================================================================
   Dep. Variable:                      y   No. Observations:                20190
   Model:                        Poisson   Df Residuals:                    20180
   Method:                           MLE   Df Model:                            9
   Date:                Mon, 20 Jul 2015   Pseudo R-squ.:                 0.06343
   Time:                        17:43:22   Log-Likelihood:                -62420.
   converged:                       True   LL-Null:                       -66647.
                                           LLR p-value:                     0.000
   ==============================================================================
                    coef    std err          z      P>|z|      [95.0% Conf. Int.]
   ------------------------------------------------------------------------------
   x1            -0.0525      0.003    -18.216      0.000        -0.058    -0.047
   x2            -0.2471      0.011    -23.272      0.000        -0.268    -0.226
   x3             0.0353      0.002     19.302      0.000         0.032     0.039
   x4            -0.0346      0.002    -21.439      0.000        -0.038    -0.031
   x5             0.2717      0.012     22.200      0.000         0.248     0.296
   x6             0.0339      0.001     60.098      0.000         0.033     0.035
   x7            -0.0126      0.009     -1.366      0.172        -0.031     0.005
   x8             0.0541      0.015      3.531      0.000         0.024     0.084
   x9             0.2061      0.026      7.843      0.000         0.155     0.258
   const          0.7004      0.011     62.741      0.000         0.678     0.722
   ==============================================================================
   

Alternative solvers

The default method for fitting discrete data MLE models is Newton-Raphson. You can use other solvers by using the method argument:

In [ ]:
mlogit_res = mlogit_mod.fit(method='bfgs', maxiter=100)
   print(mlogit_res.summary())
   
                     NegativeBinomial Regression Results                      
   ==============================================================================
   Dep. Variable:                      y   No. Observations:                20190
   Model:               NegativeBinomial   Df Residuals:                    20180
   Method:                           MLE   Df Model:                            9
   Date:                Mon, 20 Jul 2015   Pseudo R-squ.:                 0.01845
   Time:                        17:43:23   Log-Likelihood:                -43384.
   converged:                      False   LL-Null:                       -44199.
                                           LLR p-value:                     0.000
   ==============================================================================
                    coef    std err          z      P>|z|      [95.0% Conf. Int.]
   ------------------------------------------------------------------------------
   x1            -0.0580      0.006     -9.517      0.000        -0.070    -0.046
   x2            -0.2678      0.023    -11.802      0.000        -0.312    -0.223
   x3             0.0412      0.004      9.937      0.000         0.033     0.049
   x4            -0.0381      0.003    -11.219      0.000        -0.045    -0.031
   x5             0.2690      0.030      8.981      0.000         0.210     0.328
   x6             0.0382      0.001     26.081      0.000         0.035     0.041
   x7            -0.0441      0.020     -2.200      0.028        -0.083    -0.005
   x8             0.0172      0.036      0.477      0.633        -0.054     0.088
   x9             0.1780      0.074      2.397      0.017         0.032     0.324
   const          0.6636      0.025     26.787      0.000         0.615     0.712
   alpha          1.2930      0.019     69.477      0.000         1.256     1.329
   ==============================================================================
   
/Users/tom.augspurger/Envs/py3/lib/python3.4/site-packages/statsmodels-0.6.1-py3.4-macosx-10.10-x86_64.egg/statsmodels/base/model.py:466: ConvergenceWarning: Maximum Likelihood optimization failed to converge. Check mle_retvals
     "Check mle_retvals", ConvergenceWarning)