Prediction (out of sample)ΒΆ

Link to Notebook GitHub

In [ ]:
from __future__ import print_function
import numpy as np
import statsmodels.api as sm

Artificial data

In [ ]:
nsample = 50
sig = 0.25
x1 = np.linspace(0, 20, nsample)
X = np.column_stack((x1, np.sin(x1), (x1-5)**2))
X = sm.add_constant(X)
beta = [5., 0.5, 0.5, -0.02]
y_true = np.dot(X, beta)
y = y_true + sig * np.random.normal(size=nsample)

Estimation

In [ ]:
olsmod = sm.OLS(y, X)
olsres = olsmod.fit()
print(olsres.summary())

In-sample prediction

In [ ]:
ypred = olsres.predict(X)
print(ypred)
                            OLS Regression Results
==============================================================================
Dep. Variable:                      y   R-squared:                       0.981
Model:                            OLS   Adj. R-squared:                  0.980
Method:                 Least Squares   F-statistic:                     793.2
Date:                Mon, 20 Jul 2015   Prob (F-statistic):           1.34e-39
Time:                        17:44:19   Log-Likelihood:                -3.9531
No. Observations:                  50   AIC:                             15.91
Df Residuals:                      46   BIC:                             23.55
Df Model:                           3
Covariance Type:            nonrobust
==============================================================================
                 coef    std err          t      P>|t|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
const          4.9076      0.093     52.736      0.000         4.720     5.095
x1             0.5215      0.014     36.334      0.000         0.493     0.550
x2             0.5079      0.056      9.002      0.000         0.394     0.621
x3            -0.0217      0.001    -17.249      0.000        -0.024    -0.019
==============================================================================
Omnibus:                        0.097   Durbin-Watson:                   2.249
Prob(Omnibus):                  0.953   Jarque-Bera (JB):                0.210
Skew:                           0.096   Prob(JB):                        0.900
Kurtosis:                       2.747   Cond. No.                         221.
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

Create a new sample of explanatory variables Xnew, predict and plot

In [ ]:
x1n = np.linspace(20.5,25, 10)
Xnew = np.column_stack((x1n, np.sin(x1n), (x1n-5)**2))
Xnew = sm.add_constant(Xnew)
ynewpred =  olsres.predict(Xnew) # predict out of sample
print(ynewpred)
[  4.36419585   4.86372083   5.32288364   5.71400582   6.01939792
   6.23426583   6.36749844   6.4402072    6.48225762   6.52736237
   6.60754246   6.74786624   6.96233076   7.25156207   7.60271237
   7.9915709    8.38654164   8.7538345    9.06301692   9.29201382
   9.43073454   9.48273062   9.4646124    9.40331987   9.33169661
   9.28309362   9.28588985   9.35882964   9.50794309   9.72555529
   9.99154729  10.27666165  10.54730926  10.77108802  10.92210636
  10.98523716  10.95860313  10.85388518  10.69440335  10.51128533
  10.33835231  10.20656224  10.13892353  10.14671482  10.22763195
  10.36616652  10.53615388  10.70507189  10.83938423  10.91005034]

Plot comparison

In [ ]:
import matplotlib.pyplot as plt

fig, ax = plt.subplots()
ax.plot(x1, y, 'o', label="Data")
ax.plot(x1, y_true, 'b-', label="True")
ax.plot(np.hstack((x1, x1n)), np.hstack((ypred, ynewpred)), 'r', label="OLS prediction")
ax.legend(loc="best");
[ 10.88191737  10.71896653  10.44111456  10.09374917   9.73661657
   9.4291935    9.21612524   9.11629492   9.11820024   9.18276965]

Predicting with Formulas

Using formulas can make both estimation and prediction a lot easier

In [ ]:
from statsmodels.formula.api import ols

data = {"x1" : x1, "y" : y}

res = ols("y ~ x1 + np.sin(x1) + I((x1-5)**2)", data=data).fit()

We use the I to indicate use of the Identity transform. Ie., we don't want any expansion magic from using **2

In [ ]:
res.params

Now we only have to pass the single variable and we get the transformed right-hand side variables automatically

In [ ]:
res.predict(exog=dict(x1=x1n))