statsmodels.stats.anova.AnovaRM

class statsmodels.stats.anova.AnovaRM(data, depvar, subject, within=None, between=None, aggregate_func=None)[source]

Repeated measures Anova using least squares regression

The full model regression residual sum of squares is used to compare with the reduced model for calculating the within-subject effect sum of squares [1].

Currently, only fully balanced within-subject designs are supported. Calculation of between-subject effects and corrections for violation of sphericity are not yet implemented.

Parameters
  • data (DataFrame) –

  • depvar (string) – The dependent variable in data

  • subject (string) – Specify the subject id

  • within (a list of string(s)) – The within-subject factors

  • between (a list of string(s)) – The between-subject factors, this is not yet implemented

  • aggregate_func (None, 'mean', or function) – If the data set contains more than a single observation per subject and cell of the specified model, this function will be used to aggregate the data before running the Anova. None (the default) will not perform any aggregation; ‘mean’ is s shortcut to numpy.mean. An exception will be raised if aggregation is required, but no aggregation function was specified.

Returns

results

Return type

AnovaResults instance

Raises

ValueError – If the data need to be aggregated, but aggregate_func was not specified.

Notes

This implementation currently only supports fully balanced designs. If the data contain more than one observation per subject and cell of the design, these observations need to be aggregated into a single observation before the Anova is calculated, either manually or by passing an aggregation function via the aggregate_func keyword argument. Note that if the input data set was not balanced before performing the aggregation, the implied heteroscedasticity of the data is ignored.

References

*

Rutherford, Andrew. Anova and ANCOVA: a GLM approach. John Wiley & Sons, 2011.

Methods

fit()

estimate the model and compute the Anova table