One way to assess how good, how usable, a rating system is is to get more than one person to use the system to do ratings of the same things where what is rated has a broad coverage of the possible ratings in the system. Ratings may be undimensional and continuous, e.g. “how much warmth in this clip do you think this carer is expressing about …”, or categorical “into which of the categories would you put this AAI interview? The index of inter-rater agreement you are most likely to see for categorical ratings is Cohen’s kappa, a correlation coefficient is better for continuous ratings.
Inter-rater reliability/agreement is an extremely general way of assessing a rating system. Cohen’s kappa, the first really widely recognised and used description of the issues and how to use joint ratings in psychological situations, applied to just two raters rating a set of things but there are extensions and alternatives that can be used for more than two raters.
Try also #
Stevens’ levels of measurement