Well it’s what it says isn’t it? It’s about how you score multi-item measures whether they are questionnaires, interviews or observational rating schedules. Yes but! What may be obvious to the creators of a measure may not be immediately obvious to someone who has a copy of it and is expected to use it. Someone contacting me about that led me to create a scoring guide for CORE-OM!

Details #

One issue is the one of missing items: there things like pro-rating come in at the simple level and imputation at the research/geeky level.

However, even when you have “complete data”: usable values for all the items contributing to the score, there are still a lot of things that mean that how one measure is scored can be very different from how another is scored.

At the simplest level is the mapping of responses to numbers. If the responses are yes/no these are almost always mapped to 0/1 but occasionally to 1/2. If the responses have more than two levels, say five levels, then you may have a difference in the numbering, is it from zero: 0, 1, 2, 3, 4 and 5; or is it from 1: 1, 2, 3, 4, 5 and 6.

Then you have the issue of whether the score across the items is their sum or their mean.

Those simple issues do, to some extent, lead into the issue of transforming scores: see the entry about that for more on the various transformations that get used (much less so than they used to be). Sometimes transformations of scores, or scoring weights (see next) have been designed to try to make scores have a more Gaussian distribution than simple item score mappings might have produced, see Transforming data/variable(s). Other transformations are not about the shape of score distributions but about giving different scales the same score range, see Standardising/normalising for those things!

And it can get more complicated, some measures might have five levels of response but score then 0, 0, 0, 1, 2 or any similar use of integers.

Then it can get even more complicated and the scoring may be even more complicated with scoring that is not integer: 0, .31, .73, 1.82 and 5.23. Such scoring has generally been derived from factor analysis or from item response theory (IRT) analyses of data from the measure and here again there are different ways to get “factor scores” (i.e. the mappings from “raw” item scores to those multipliers). Equally, there are different IRT methods even for binary responses where scoring might allow for a “guessing” issue, though I’ve never seen that outside educational/aptitude measures.

One interesting thing is how reports either go into great detail about how a scoring system was derived and why, or else say nothing about what scoring was used. (Mea culpa: when I have looked back at a number of my papers I have seen that I didn’t make it explicit what scoring was used, whether pro-rating was used and if so, up to what number of missing items etc. It’s horribly easy to forget that you should really say these things and that they may not be obvious to the reader or you may think that the details are in the earlier paper you cite, only to find that they’re not.)

Try also #


Chapters #

Never explicitly covered in the OMbook.

Online resources #

None yet.

Dates #

First created 13.iv.24, tweaked 15.iv.24 to add link to CORE-OM scoring guide (the request for that had actually triggered creating this entry!)

Powered by BetterDocs