Psychometrics

I guess this is mostly what I’ve worked on since about 1984 so of course I’ve got my own take on this that isn’t probably the dominant way the word is used but the words just means measuring the mind or its functions, its phenomena.

Details #

What does it include? #

Conventionally the term is used almost exclusively for quantitative psychometrics excluding what I call “qualitative psychometrics”. Hang on, how can measurement be qualitative? All the time, if you try to explain why you prefer dogs to cats, rock music to classical or find a word for a colour you are doing qualitative psychometrics and it matters.

Next the term is pretty much restricted to “nomothetic psychometrics”: trying to measure things on which we are thought all to have a position: happiness, sadness, distress. That excludes “idiographic (or ideographic) psychometrics” in which we try to translate into measurement something that might be unique to one person, or, bridging the idiographic/nomothetic, something on which one might have a personal view but in a domain in which others might have views: the relative merits of different recordings of Bach’s St. Matthew Passion or recordings of Dire Straits performing “Romeo and Juliet”. Idiographic methods include things like repertory grids, paired comparisons, card sort methods, Phillips’s Personal Questionnaire Rapid Scaling Technique (PQRST), the methods of mismatched cases and my own love: the method of derangements.

However, we are getting down to what you will see the term used for most, at least in our field: the various processes of distilling scores or rankings out of nomothetic data, usually multi-item questionnaire data. To a large extent these methods are about “dimension reduction” or simplification: they are testing fit between the data you have and models, models that can help simplify things usefully (but can also mislead).

Within this realm we still have two often warring and very rarely overlapping approaches: Classical Test Theory (CTT) and Item Response Theory (IRT). These tend to share two key terms: reliability and validity (though they may define or index them differently). I have separate entries for each of CTT and IRT and for reliability and validity so go to those for more detail on each of them. (And yes, you are correct: each of them goes on to fragment further but the ideas do matter and aren’t only evangelical wars!)

A final point, keeping within nomothetic, quantitative psychometrics (or “psychometrics” to its afficionad[o|a]s, is that some methods attempt to provide rational, formal, ways to estimate the quality of fit between the simplifying model and the data you have and thence to suggest how well your findings, that simplified model, that fit, might generalise to other datasets. These are methods of estimation and/or hypothesis testing and most modern factor analytic methods offer these explorations of the fit between model and data. Other methods don’t claim to do that but do still simplify your data: Rasch IRT models, Principal Component Analyses and at least some Multidimensional Scaling (MDS) models fall into this grouping. The choice between these approaches and their implications is not always made clear in psychometric reports.

“Levels of measurement” #

This is messy and the issues have been hugely exaggerated and gone rather in and out of fashion since Stevens’s oversimplying paper back in 1946. The ideas impinge more broadly than just within psychometrics, see glossary entry here. However, it remains fair to say that some psychometric methods treat item scores on measures as if they had interval or ratio scaling whereas others, notably multidimensional scaling, only assume that the numbers have ordinal scaling. It’s an interesting point whether Item Response Theory methods only assume ordinal scaling as they work from there to assign logit, essentially, ratio, scaling to the response levels. Their real attraction is that they do this by generally fairly defensible mathematical methods.

Cautionary notes: modern psychometrics is very prone to methodolatry #

Firstly: always remember that this is all about inferring “latent variables” dimensions on which people differ. These are intrinsically, as the name says, “latent”: unmeasurable by direct means, we are using mathematical, statistical methods to infer information about them using many completions of the same measures by many people (generally, the more the better for the maths/stats). The methods can be excellent to reveal systematic tendencies in the many completions of the measures but this measurement is remains inferential. The latent variables which may or may not appear, or may appear more or less strongly, remain inferences about tendencies across the respondents who provided the dataset. This is not the sort of measurement that we are used to when we weigh shopping, or measure furniture nor that of any non-relativistic, non-quantum physical science measurement.

That being so, these methods have only very limited, and still inferential, ways to infer things about any individual no matter how many items they completed, and no matter on how many occasions that individual completed the measures. Taking the inferences about generalities of differences between individuals in the dataset (or outside it) is to use the: “ecological fallacy”, i.e. to impute from many other peoples’ data to that person: it is an informed guess though for some measures it may be a useful one. For example, in ability and educational tests, it may be really quite a useful one and someone scoring high on such measures may be very likely to be very much stronger on whatever the measure measures than someone scoring low. To a far more limited extent the same is true for measures of positive or negative psychological states but generally the caveats that should be attached to making judgements about individuals on our measures should be very, very careful.

Summary #

Psychometrics, of whatever sort, done well, can tell you very useful things that simplify and help understand the data you got from a group of people. Done carefully, thinking about the assumptions involved, psychometric methods can help make inferences and estimations about populations of people from whom our initial groups came. Again, used well, interpreted carefully, psychometric methods can help us generalise from the group of people who completed the measures and whose data was analysed to suggest what we might expect when other groups complete the measures and can tell us if groups differ in ways that might be important. Finally, done very carefully these methods can tell you something about individuals within your dataset and, if handled very thoughtfully, can help you read things into data from another individual, not in your initial dataset, who completes the same measures.

Sadly, much published psychometric work oversells findings as rather more than this. Buyer beware!

Try also #

Card sort methods
Classical Test Theory (CTT)
Confirmatory Factor Analysis (CFA): almost always conducted within the CTT framework
Derangements (method of)
Exploratory Factor Analysis (EFA): almost always conducted within the CTT framework
Factor analysis (basically CFA and EFA and again almost always conducted within the CTT framework)
Idiographic methods
Item Response Theory (IRT)
Multidimensional scaling (MDS): related to EFA but only assuming ordinal scaling
Rasch analysis/model: an IRT method
Reliability: important concept in both CTT and IRT, includes
- Internal reliability/consistency
- Inter-rater reliability (for rating measures not self-report ones)
- Test-retest reliability (generally in CTT approach applies to both ratings and self-report)
Validity: important concept in both CTT and IRT and generally but messy subsets
- Face validity
- Content validity
- Construct validity
- Convergent validity
- Discriminant (occasionally, but unhelpfully, termed “divergent”)
- Predictive validity
- Repertory grid methods
- Rigorous idiography: see my PSYCTC.org page
- “Scaling” & Stevens’s “levels of measurement”
“Nomological network”: yes, out of alphabetical order but here because after 70 years it’s still a useful counter to some of the silliness in quantitative psychometrics

Chapters #

The ideas really run all the way through the OMbook but we tried hard to avoid dropping into the zealous and often misleading framing of the whole area.

Online resources #

The internet is full of material but beware: in the way much of it restricts things it often oversimplifies and overstates findings and downplays or completely ignores the issues with the methods.

Dates #

First created 3.iv.26, updated 5.iv.26. Will probably remain an evolving, overarching guide entry to a lot of the glossary for a long time.

want to suggest changes or got questions?

Updated on 5th April 2026