Rolling Regression¶

Rolling OLS applies OLS across a fixed windows of observations and then rolls (moves or slides) the window across the data set. They key parameter is window which determines the number of observations used in each OLS regression. By default, RollingOLS drops missing values in the window and so will estimate the model using the available data points.

Estimated values are aligned so that models estimated using data points \(i+1, i+2, ... i+window\) are stored in location \(i+window\).

Start by importing the modules that are used in this notebook.

[1]:

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import pandas_datareader as pdr
import seaborn

import statsmodels.api as sm
from statsmodels.regression.rolling import RollingOLS

seaborn.set_style("darkgrid")
pd.plotting.register_matplotlib_converters()
%matplotlib inline

pandas-datareader is used to download data from Ken French’s website. The two data sets downloaded are the 3 Fama-French factors and the 10 industry portfolios. Data is available from 1926.

The data are monthly returns for the factors or industry portfolios.

[2]:

factors = pdr.get_data_famafrench("F-F_Research_Data_Factors", start="1-1-1926")[0]
factors.head()

/tmp/ipykernel_4062/1924419770.py:1: FutureWarning: The argument 'date_parser' is deprecated and will be removed in a future version. Please use 'date_format' instead, or read your data in as 'object' dtype and then call 'to_datetime'.
  factors = pdr.get_data_famafrench("F-F_Research_Data_Factors", start="1-1-1926")[0]
/tmp/ipykernel_4062/1924419770.py:1: FutureWarning: The argument 'date_parser' is deprecated and will be removed in a future version. Please use 'date_format' instead, or read your data in as 'object' dtype and then call 'to_datetime'.
  factors = pdr.get_data_famafrench("F-F_Research_Data_Factors", start="1-1-1926")[0]

[2]:

	Mkt-RF	SMB	HML	RF
Date
1926-07	2.89	-2.55	-2.39	0.22
1926-08	2.64	-1.14	3.81	0.25
1926-09	0.38	-1.36	0.05	0.23
1926-10	-3.27	-0.14	0.82	0.32
1926-11	2.54	-0.11	-0.61	0.31

[3]:

industries = pdr.get_data_famafrench("10_Industry_Portfolios", start="1-1-1926")[0]
industries.head()

/tmp/ipykernel_4062/268191425.py:1: FutureWarning: The argument 'date_parser' is deprecated and will be removed in a future version. Please use 'date_format' instead, or read your data in as 'object' dtype and then call 'to_datetime'.
  industries = pdr.get_data_famafrench("10_Industry_Portfolios", start="1-1-1926")[0]
/tmp/ipykernel_4062/268191425.py:1: FutureWarning: The argument 'date_parser' is deprecated and will be removed in a future version. Please use 'date_format' instead, or read your data in as 'object' dtype and then call 'to_datetime'.
  industries = pdr.get_data_famafrench("10_Industry_Portfolios", start="1-1-1926")[0]
/tmp/ipykernel_4062/268191425.py:1: FutureWarning: The argument 'date_parser' is deprecated and will be removed in a future version. Please use 'date_format' instead, or read your data in as 'object' dtype and then call 'to_datetime'.
  industries = pdr.get_data_famafrench("10_Industry_Portfolios", start="1-1-1926")[0]
/tmp/ipykernel_4062/268191425.py:1: FutureWarning: The argument 'date_parser' is deprecated and will be removed in a future version. Please use 'date_format' instead, or read your data in as 'object' dtype and then call 'to_datetime'.
  industries = pdr.get_data_famafrench("10_Industry_Portfolios", start="1-1-1926")[0]
/tmp/ipykernel_4062/268191425.py:1: FutureWarning: The argument 'date_parser' is deprecated and will be removed in a future version. Please use 'date_format' instead, or read your data in as 'object' dtype and then call 'to_datetime'.
  industries = pdr.get_data_famafrench("10_Industry_Portfolios", start="1-1-1926")[0]
/tmp/ipykernel_4062/268191425.py:1: FutureWarning: The argument 'date_parser' is deprecated and will be removed in a future version. Please use 'date_format' instead, or read your data in as 'object' dtype and then call 'to_datetime'.
  industries = pdr.get_data_famafrench("10_Industry_Portfolios", start="1-1-1926")[0]
/tmp/ipykernel_4062/268191425.py:1: FutureWarning: The argument 'date_parser' is deprecated and will be removed in a future version. Please use 'date_format' instead, or read your data in as 'object' dtype and then call 'to_datetime'.
  industries = pdr.get_data_famafrench("10_Industry_Portfolios", start="1-1-1926")[0]
/tmp/ipykernel_4062/268191425.py:1: FutureWarning: The argument 'date_parser' is deprecated and will be removed in a future version. Please use 'date_format' instead, or read your data in as 'object' dtype and then call 'to_datetime'.
  industries = pdr.get_data_famafrench("10_Industry_Portfolios", start="1-1-1926")[0]

[3]:

	NoDur	Durbl	Manuf	Enrgy	HiTec	Telcm	Shops	Hlth	Utils	Other
Date
1926-07	1.44	13.90	4.70	-1.14	2.90	0.83	0.12	1.85	7.04	2.14
1926-08	3.99	3.70	2.80	3.43	2.66	2.17	-0.72	4.17	-1.70	4.35
1926-09	1.15	4.98	1.17	-3.30	-0.39	2.42	0.21	0.69	2.05	0.31
1926-10	-1.24	-8.39	-3.65	-0.78	-4.58	-0.11	-2.29	-0.57	-3.27	-2.85
1926-11	5.21	-0.17	4.27	0.01	4.71	1.63	6.45	5.43	4.40	2.13

The first model estimated is a rolling version of the CAPM that regresses the excess return of Technology sector firms on the excess return of the market.

The window is 60 months, and so results are available after the first 60 (window) months. The first 59 (window - 1) estimates are all nan filled.

[4]:

endog = industries.HiTec - factors.RF.values
exog = sm.add_constant(factors["Mkt-RF"])
rols = RollingOLS(endog, exog, window=60)
rres = rols.fit()
params = rres.params.copy()
params.index = np.arange(1, params.shape[0] + 1)
params.head()

[4]:

	const	Mkt-RF
1	NaN	NaN
2	NaN	NaN
3	NaN	NaN
4	NaN	NaN
5	NaN	NaN

[5]:

params.iloc[57:62]

[5]:

	const	Mkt-RF
58	NaN	NaN
59	NaN	NaN
60	0.873061	1.401724
61	0.875674	1.408865
62	0.949387	1.411108

[6]:

params.tail()

[6]:

	const	Mkt-RF
1189	0.268380	1.152953
1190	0.221055	1.143949
1191	0.287100	1.143878
1192	0.324110	1.146296
1193	0.300006	1.187318

We next plot the market loading along with a 95% point-wise confidence interval. The alpha=False omits the constant column, if present.

[7]:

fig = rres.plot_recursive_coefficient(variables=["Mkt-RF"], figsize=(14, 6))

../../../_images/examples_notebooks_generated_rolling_ls_10_0.png

Next, the model is expanded to include all three factors, the excess market, the size factor and the value factor.

[8]:

exog_vars = ["Mkt-RF", "SMB", "HML"]
exog = sm.add_constant(factors[exog_vars])
rols = RollingOLS(endog, exog, window=60)
rres = rols.fit()
fig = rres.plot_recursive_coefficient(variables=exog_vars, figsize=(14, 18))

../../../_images/examples_notebooks_generated_rolling_ls_12_0.png

Formulas¶

RollingOLS and RollingWLS both support model specification using the formula interface. The example below is equivalent to the 3-factor model estimated previously. Note that one variable is renamed to have a valid Python variable name.

[9]:

joined = pd.concat([factors, industries], axis=1)
joined["Mkt_RF"] = joined["Mkt-RF"]
mod = RollingOLS.from_formula("HiTec ~ Mkt_RF + SMB + HML", data=joined, window=60)
rres = mod.fit()
rres.params.tail()

[9]:

	Intercept	Mkt_RF	SMB	HML
Date
2025-07	0.740575	1.118338	-0.116287	-0.431174
2025-08	0.748434	1.118000	-0.112657	-0.429351
2025-09	0.852659	1.107885	-0.111405	-0.444117
2025-10	0.826011	1.117027	-0.132702	-0.453540
2025-11	0.824234	1.138548	-0.116628	-0.450753

`RollingWLS`: Rolling Weighted Least Squares¶

The rolling module also provides RollingWLS which takes an optional weights input to perform rolling weighted least squares. It produces results that match WLS when applied to rolling windows of data.

Fit Options¶

Fit accepts other optional keywords to set the covariance estimator. Only two estimators are supported, 'nonrobust' (the classic OLS estimator) and 'HC0' which is White’s heteroskedasticity robust estimator.

You can set params_only=True to only estimate the model parameters. This is substantially faster than computing the full set of values required to perform inference.

Finally, the parameter reset can be set to a positive integer to control estimation error in very long samples. RollingOLS avoids the full matrix product when rolling by only adding the most recent observation and removing the dropped observation as it rolls through the sample. Setting reset uses the full inner product every reset periods. In most applications this parameter can be omitted.

[10]:

%timeit rols.fit()
%timeit rols.fit(params_only=True)

291 ms ± 23.6 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
81.3 ms ± 3.95 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Expanding Sample¶

It is possible to expand the sample until sufficient observations are available for the full window length. In this example, we start once we have 12 observations available, and then increase the sample until we have 60 observations available. The first non-nan value is computed using 12 observations, the second 13, and so on. All other estimates are computed using 60 observations.

[11]:

res = RollingOLS(endog, exog, window=60, min_nobs=12, expanding=True).fit()
res.params.iloc[10:15]

[11]:

	const	Mkt-RF	SMB	HML
Date
1927-05	NaN	NaN	NaN	NaN
1927-06	1.649548	0.995914	1.411600	-0.527192
1927-07	1.293268	1.303701	0.759224	-0.586595
1927-08	1.258265	1.297274	0.736316	-0.565778
1927-09	1.395880	1.283333	1.168655	-0.634059

[12]:

res.nobs[10:15]

[12]:

Date
1927-05     0
1927-06    12
1927-07    13
1927-08    14
1927-09    15
Freq: M, dtype: int64

Rolling Regression¶

Formulas¶

RollingWLS: Rolling Weighted Least Squares¶

Fit Options¶

Expanding Sample¶

`RollingWLS`: Rolling Weighted Least Squares¶