`_ .. raw:: html

In [ ]:

from __future__ import print_function
   import numpy as np
   import statsmodels.api as sm

Data¶

Load data from Spector and Mazzeo (1980). Examples follow Greene's Econometric Analysis Ch. 21 (5th Edition).

In [ ]:

spector_data = sm.datasets.spector.load()
   spector_data.exog = sm.add_constant(spector_data.exog, prepend=False)

Inspect the data:

In [ ]:

print(spector_data.exog[:5,:])
   print(spector_data.endog[:5])

Linear Probability Model (OLS)¶

In [ ]:

lpm_mod = sm.OLS(spector_data.endog, spector_data.exog)
   lpm_res = lpm_mod.fit()
   print('Parameters: ', lpm_res.params[:-1])

[[  2.66  20.     0.     1.  ]
    [  2.89  22.     0.     1.  ]
    [  3.28  24.     0.     1.  ]
    [  2.92  12.     0.     1.  ]
    [  4.    21.     0.     1.  ]]
   [ 0.  0.  0.  0.  1.]

Logit Model¶

In [ ]:

logit_mod = sm.Logit(spector_data.endog, spector_data.exog)
   logit_res = logit_mod.fit(disp=0)
   print('Parameters: ', logit_res.params)

Parameters:  [ 0.46385168  0.01049512  0.37855479]

Marginal Effects

In [ ]:

margeff = logit_res.get_margeff()
   print(margeff.summary())

Parameters:  [  2.82611259   0.09515766   2.37868766 -13.02134686]

As in all the discrete data models presented below, we can print a nice summary of results:

In [ ]:

print(logit_res.summary())

        Logit Marginal Effects       
   =====================================
   Dep. Variable:                      y
   Method:                          dydx
   At:                           overall
   ==============================================================================
                   dy/dx    std err          z      P>|z|      [95.0% Conf. Int.]
   ------------------------------------------------------------------------------
   x1             0.3626      0.109      3.313      0.001         0.148     0.577
   x2             0.0122      0.018      0.686      0.493        -0.023     0.047
   x3             0.3052      0.092      3.304      0.001         0.124     0.486
   ==============================================================================

Probit Model¶

In [ ]:

probit_mod = sm.Probit(spector_data.endog, spector_data.exog)
   probit_res = probit_mod.fit()
   probit_margeff = probit_res.get_margeff()
   print('Parameters: ', probit_res.params)
   print('Marginal effects: ')
   print(probit_margeff.summary())

                           Logit Regression Results                           
   ==============================================================================
   Dep. Variable:                      y   No. Observations:                   32
   Model:                          Logit   Df Residuals:                       28
   Method:                           MLE   Df Model:                            3
   Date:                Mon, 20 Jul 2015   Pseudo R-squ.:                  0.3740
   Time:                        17:43:21   Log-Likelihood:                -12.890
   converged:                       True   LL-Null:                       -20.592
                                           LLR p-value:                  0.001502
   ==============================================================================
                    coef    std err          z      P>|z|      [95.0% Conf. Int.]
   ------------------------------------------------------------------------------
   x1             2.8261      1.263      2.238      0.025         0.351     5.301
   x2             0.0952      0.142      0.672      0.501        -0.182     0.373
   x3             2.3787      1.065      2.234      0.025         0.292     4.465
   const        -13.0213      4.931     -2.641      0.008       -22.687    -3.356
   ==============================================================================

Multinomial Logit¶

Load data from the American National Election Studies:

In [ ]:

anes_data = sm.datasets.anes96.load()
   anes_exog = anes_data.exog
   anes_exog = sm.add_constant(anes_exog, prepend=False)

Optimization terminated successfully.
            Current function value: 0.400588
            Iterations 6
   Parameters:  [ 1.62581004  0.05172895  1.42633234 -7.45231965]
   Marginal effects: 
          Probit Marginal Effects       
   =====================================
   Dep. Variable:                      y
   Method:                          dydx
   At:                           overall
   ==============================================================================
                   dy/dx    std err          z      P>|z|      [95.0% Conf. Int.]
   ------------------------------------------------------------------------------
   x1             0.3608      0.113      3.182      0.001         0.139     0.583
   x2             0.0115      0.018      0.624      0.533        -0.025     0.048
   x3             0.3165      0.090      3.508      0.000         0.140     0.493
   ==============================================================================

Inspect the data:

In [ ]:

print(anes_data.exog[:5,:])
   print(anes_data.endog[:5])

Fit MNL model:

In [ ]:

mlogit_mod = sm.MNLogit(anes_data.endog, anes_exog)
   mlogit_res = mlogit_mod.fit()
   print(mlogit_res.params)

[[ -2.30258509   7.          36.           3.           1.        ]
    [  5.24755025   3.          20.           4.           1.        ]
    [  3.43720782   2.          24.           6.           1.        ]
    [  4.4200447    3.          28.           6.           1.        ]
    [  6.46162441   5.          68.           6.           1.        ]]
   [ 6.  1.  1.  1.  0.]

Poisson¶

Load the Rand data. Note that this example is similar to Cameron and Trivedi's Microeconometrics Table 20.5, but it is slightly different because of minor changes in the data.

In [ ]:

rand_data = sm.datasets.randhie.load()
   rand_exog = rand_data.exog.view(float).reshape(len(rand_data.exog), -1)
   rand_exog = sm.add_constant(rand_exog, prepend=False)

Optimization terminated successfully.
            Current function value: 1.548647
            Iterations 7
   [[ -1.15359746e-02  -8.87506530e-02  -1.05966699e-01  -9.15567017e-02
      -9.32846040e-02  -1.40880692e-01]
    [  2.97714352e-01   3.91668642e-01   5.73450508e-01   1.27877179e+00
       1.34696165e+00   2.07008014e+00]
    [ -2.49449954e-02  -2.28978371e-02  -1.48512069e-02  -8.68134503e-03
      -1.79040689e-02  -9.43264870e-03]
    [  8.24914421e-02   1.81042758e-01  -7.15241904e-03   1.99827955e-01
       2.16938850e-01   3.21925702e-01]
    [  5.19655317e-03   4.78739761e-02   5.75751595e-02   8.44983753e-02
       8.09584122e-02   1.08894083e-01]
    [ -3.73401677e-01  -2.25091318e+00  -3.66558353e+00  -7.61384309e+00
      -7.06047825e+00  -1.21057509e+01]]

Fit Poisson model:

In [ ]:

poisson_mod = sm.Poisson(rand_data.endog, rand_exog)
   poisson_res = poisson_mod.fit(method="newton")
   print(poisson_res.summary())

Negative Binomial¶

The negative binomial model gives slightly different results.

In [ ]:

mod_nbin = sm.NegativeBinomial(rand_data.endog, rand_exog)
   res_nbin = mod_nbin.fit(disp=False)
   print(res_nbin.summary())

Optimization terminated successfully.
            Current function value: 3.091609
            Iterations 12
                             Poisson Regression Results                          
   ==============================================================================
   Dep. Variable:                      y   No. Observations:                20190
   Model:                        Poisson   Df Residuals:                    20180
   Method:                           MLE   Df Model:                            9
   Date:                Mon, 20 Jul 2015   Pseudo R-squ.:                 0.06343
   Time:                        17:43:22   Log-Likelihood:                -62420.
   converged:                       True   LL-Null:                       -66647.
                                           LLR p-value:                     0.000
   ==============================================================================
                    coef    std err          z      P>|z|      [95.0% Conf. Int.]
   ------------------------------------------------------------------------------
   x1            -0.0525      0.003    -18.216      0.000        -0.058    -0.047
   x2            -0.2471      0.011    -23.272      0.000        -0.268    -0.226
   x3             0.0353      0.002     19.302      0.000         0.032     0.039
   x4            -0.0346      0.002    -21.439      0.000        -0.038    -0.031
   x5             0.2717      0.012     22.200      0.000         0.248     0.296
   x6             0.0339      0.001     60.098      0.000         0.033     0.035
   x7            -0.0126      0.009     -1.366      0.172        -0.031     0.005
   x8             0.0541      0.015      3.531      0.000         0.024     0.084
   x9             0.2061      0.026      7.843      0.000         0.155     0.258
   const          0.7004      0.011     62.741      0.000         0.678     0.722
   ==============================================================================

Alternative solvers¶

The default method for fitting discrete data MLE models is Newton-Raphson. You can use other solvers by using the method argument:

In [ ]:

mlogit_res = mlogit_mod.fit(method='bfgs', maxiter=100)
   print(mlogit_res.summary())

                     NegativeBinomial Regression Results                      
   ==============================================================================
   Dep. Variable:                      y   No. Observations:                20190
   Model:               NegativeBinomial   Df Residuals:                    20180
   Method:                           MLE   Df Model:                            9
   Date:                Mon, 20 Jul 2015   Pseudo R-squ.:                 0.01845
   Time:                        17:43:23   Log-Likelihood:                -43384.
   converged:                      False   LL-Null:                       -44199.
                                           LLR p-value:                     0.000
   ==============================================================================
                    coef    std err          z      P>|z|      [95.0% Conf. Int.]
   ------------------------------------------------------------------------------
   x1            -0.0580      0.006     -9.517      0.000        -0.070    -0.046
   x2            -0.2678      0.023    -11.802      0.000        -0.312    -0.223
   x3             0.0412      0.004      9.937      0.000         0.033     0.049
   x4            -0.0381      0.003    -11.219      0.000        -0.045    -0.031
   x5             0.2690      0.030      8.981      0.000         0.210     0.328
   x6             0.0382      0.001     26.081      0.000         0.035     0.041
   x7            -0.0441      0.020     -2.200      0.028        -0.083    -0.005
   x8             0.0172      0.036      0.477      0.633        -0.054     0.088
   x9             0.1780      0.074      2.397      0.017         0.032     0.324
   const          0.6636      0.025     26.787      0.000         0.615     0.712
   alpha          1.2930      0.019     69.477      0.000         1.256     1.329
   ==============================================================================

/Users/tom.augspurger/Envs/py3/lib/python3.4/site-packages/statsmodels-0.6.1-py3.4-macosx-10.10-x86_64.egg/statsmodels/base/model.py:466: ConvergenceWarning: Maximum Likelihood optimization failed to converge. Check mle_retvals
     "Check mle_retvals", ConvergenceWarning)