Inter-rater reliability for more than two raters and categorical ratings

Enter a name for the analysis if you want

Enter the rating data, with rows for the objects rated and columns for the raters and each rating separating each rating by any kind of white space and/or <Enter>
(easiest for you if you use space between ratings for an object and <ENTER> to go to next object rated). The program can handle up to 2000 objects and up to 50 raters but bootstrapping something that size may take a very long time (I haven't experimented yet!). The program should handle letters for ratings as well as numbers but I recommend using numbers as it allows you to do a little rating checking.

The data entry area is huge to allow for up to 2k objects. hit here to jump to end of form after the data area


Number of objects rated:
Number of raters:
Rater names/labels (one line for each construct, optional):

Ratings numbers or letters (1 = numbers, 0 = letters:

If ratings were numbers, what's highest number used:
Lowest possible:

The program can compute the bootstrapped confidence interval for Light's kappa.
Please don't use this unless
(a) you're happy to wait some minutes for the answer, and
(b) you really have a publication need for this.

The reason is that a sensible number of bootstrap resamplings is in the range 5k to 20k I think and the process is extremely computationally intensive so even on a pretty powerful dual opteron machine it takes minutes to come back with the answer (PLEASE don't assume it's crashed and cancel!) and many people invoking this will bring the machine to its knees and then I'll have to remove it! If you do want this, change next parameter to 1 and I recommend leaving the bootstrap resampling at the default of 10k and the confidence interval you want at the usual .95, i.e. 95%;.


Note: connection data and the data themselves are logged.