statsmodels.sandbox.regression.try_ols_anova.form2design

statsmodels.sandbox.regression.try_ols_anova.form2design(ss, data)[source]

convert string formula to data dictionary

ssstring
  • I : add constant

  • varname : for simple varnames data is used as is

  • F:varname : create dummy variables for factor varname

  • P:varname1*varname2 : create product dummy variables for varnames

  • G:varname1*varname2 : create product between factor and continuous variable

datadict or structured array

data set, access of variables by name as in dictionaries

Returns
varsdictionary

dictionary of variables with converted dummy variables

nameslist

list of names, product (P:) and grouped continuous variables (G:) have name by joining individual names sorted according to input

Notes

with sorted dict, separate name list wouldn’t be necessary

Examples

>>> xx, n = form2design('I a F:b P:c*d G:c*f', testdata)
>>> xx.keys()
['a', 'b', 'const', 'cf', 'cd']
>>> n
['const', 'a', 'b', 'cd', 'cf']