You are most likely to see this in inter-rater agreement studies. It is an extension of Cohen’s kappa which is one of the simplest ways of looking at agreement between raters in their classification of a number of objects. If you haven’t come across Cohen’s kappa before do read my glossary entry about it before you move into the details (next!)
Details #
So the simplest application of Cohen’s kappa looks at how well two raters agree in classifying objects and it treats all disagreements equally. So, if you were classifying responses from clients following a therapist’s comments and you were using a very simple three category system say, “takes up the idea”, “rejects the idea” and “takes a tangent”. For ordinary, unweighted kappa any disagreement has the same effect on the value of kappa as any other disagreement. For example, one rater saying “takes up the idea” and the other saying “takes a tangent” has the same effect as the one saying “takes up the idea” and the other saying “rejects the idea”. That might seem to be throwing away information if, as I do, you think that the second disagreement is more of a disagreement than the first. Weighted kappa “weights” disagreement so the first disagreement might be weighted 1 and the second one weighted 2, or even higher. (Which is the same as saying that the options “takes up the idea”, “rejects the idea” and “takes a tangent” form an ordinal scale, they are not just three clearly distinct labels.)
Weighted kappa only comes into play when there are more than two rating categories and only if there is a plausible argument that some disagreements are greater than others. Clearly different weighting schemes will give different kappa values and the kappa value will almost certainly be different if weighting is used than if it isn’t. That makes it important that you record publicly somewhere, what weighting you will use before you analyse your data. That way you can’t be accused of having fiddled around with the weighting to get the result you wanted. (It wouldn’t be a terribly plausible accusation, but best to pre-empt it.)
Try also #
- Cohen’s kappa
- Inter-rater agreement/reliability
- Nominal/category scaling
- Ordinal scaling
- Rating scales
- Reliability
Chapters #
Not covered in the OMbook.
Online resources #
Now there’s a post here explaining weighted kappa my Rblog and I may add an interactive exploration in my shiny apps at some point.
Dates #
First created 1.i.26, links improved 2.i.26 and Rblog post added with link.