statsmodels.stats.inter_rater.fleiss_kappa(table, method='fleiss')[source]

Fleiss’ and Randolph’s kappa multi-rater agreement measure

tablearray_like, 2-D

assumes subjects in rows, and categories in columns. Convert raw data into this format by using statsmodels.stats.inter_rater.aggregate_raters


Method ‘fleiss’ returns Fleiss’ kappa which uses the sample margin to define the chance outcome. Method ‘randolph’ or ‘uniform’ (only first 4 letters are needed) returns Randolph’s (2005) multirater kappa which assumes a uniform distribution of the categories to define the chance outcome.


Fleiss’s or Randolph’s kappa statistic for inter rater agreement


no variance or hypothesis tests yet

Interrater agreement measures like Fleiss’s kappa measure agreement relative to chance agreement. Different authors have proposed ways of defining these chance agreements. Fleiss’ is based on the marginal sample distribution of categories, while Randolph uses a uniform distribution of categories as benchmark. Warrens (2010) showed that Randolph’s kappa is always larger or equal to Fleiss’ kappa. Under some commonly observed condition, Fleiss’ and Randolph’s kappa provide lower and upper bounds for two similar kappa_like measures by Light (1971) and Hubert (1977).



Fleiss, Joseph L. 1971. “Measuring Nominal Scale Agreement among Many Raters.” Psychological Bulletin 76 (5): 378-82.

Randolph, Justus J. 2005 “Free-Marginal Multirater Kappa (multirater K [free]): An Alternative to Fleiss’ Fixed-Marginal Multirater Kappa.” Presented at the Joensuu Learning and Instruction Symposium, vol. 2005

Warrens, Matthijs J. 2010. “Inequalities between Multi-Rater Kappas.” Advances in Data Analysis and Classification 4 (4): 271-86.

Last update: Dec 14, 2023