statsmodels.sandbox.regression.try_ols_anova.form2design

statsmodels.sandbox.regression.try_ols_anova.form2design(ss, data)[source]

convert string formula to data dictionary

ss : string
  • I : add constant
  • varname : for simple varnames data is used as is
  • F:varname : create dummy variables for factor varname
  • P:varname1*varname2 : create product dummy variables for varnames
  • G:varname1*varname2 : create product between factor and continuous variable
data : dict or structured array
data set, access of variables by name as in dictionaries
Returns:
  • vars (dictionary) – dictionary of variables with converted dummy variables
  • names (list) – list of names, product (P:) and grouped continuous variables (G:) have name by joining individual names sorted according to input

Examples

>>> xx, n = form2design('I a F:b P:c*d G:c*f', testdata)
>>> xx.keys()
['a', 'b', 'const', 'cf', 'cd']
>>> n
['const', 'a', 'b', 'cd', 'cf']

Notes

with sorted dict, separate name list wouldn’t be necessary