This expands the summary entry under validity. It’s probably best to read the whole of that entry before reading this unless you are fairly familiar with the idea of validity.
Face validity is in many ways the simplest aspect of the assessment of the validity of a measure: do its items look to be addressing key features of what you want to measure? There are a few issues here though.
The first is largely historical: there was a vogue in the mid 20th Century for what I call “opaque” measures. These were measures intended to conceal what was being measured from the person completing the measure with the aim that this would remove deliberately misleading answering and so perhaps produce a more valid measure. Such measures were constructed by deliberately using items that did not have face validity but which had the characteristic that when answered by many people there were small but statistically significant correlations between answers on each item and the variable the measure was designed to address. These have, I think completely correctly, pretty much disappeared though perhaps the “lie scale” in the Eysenck personality measures is a surviving example as perhaps it’s not obvious to respondents that it designed to detect their proneness to lie, largely down to a tendency to give the perceived socially more desirable answer even if it’s not the most accurate self-descriptor. Arguably, all social desirability measures are opaque measures but these can be seen to have face validity. Again though, they have largely dropped out of fashion.
This leads into a minor issue that sometimes face validity may be trumped by other issues: items that have excellent face validity may be worded such that they also have other problems. A social desirability aspect is the most typical.
The main issue about face validity is about who decides. Traditionally this was professionals and it is still common in measure translation guidelines for all the emphasis to be on “expert assessment” of the comparative face validity of translated items. There are two problems with this, the first being a fairly simple one that professional experts are generally pretty well educated and have a “reading age” and vocabulary very different from that of the general population and likely target population of the measure. They are also likely to be somewhat more affluent than the general population and to have relatively safe and advantageous social situations which may mean they neglect how sociodemographic issues may shape or reshape the variable to be measured.
The second issue about who decides is perhaps more subtle but just as important as the first: relying on experts and their judgements of face validity risks a rather scholastic focus on refining definitions and over-emphasis on variables that may be capable of such refinement. That tends to prioritise unidimensional measures, diagnostically focused measures or, occasionally, measures driven by therapy modality ideas, against short, broad coverage measures. This is perhaps a particular challenge in the development of quality of life measures.
Clearly the corrective to these issues is to make sure that assessment of face validity, in fact all qualitative assessment of measures, should have strong user involvement: something that has developed in the late 20th and early 21st Centuries. The issues are particularly important in translating measures and an example of trying balance expertise with lay input is our recent paper ““Infeliz” or “Triste”: A Paradigm for Mixed Methods Exploration of Outcome Measures Adaptation Across Language Variants” (Evans et al., 2021). We’re also happy that that paper uses light but formalised qualitative measures, genuine blending of qualitative and quantitative methods, and experimental quantative approaches to explore validity to move from face validity issues to both novel and more traditional psychometic assessments of a translation.
The ultimate solution, at least at the ideographic level, to face validity is user generated measures where the participant chooses the content of the measure. This does introduce some other issues for other glossary entries!
Try also #
User generated measures
Validity issues run throughout the book but are covered particularly in Chapters 1 to 4.
Evans, C., Paz, C., & Mascialino, G. (2021). “Infeliz” or “Triste”: A Paradigm for Mixed Methods Exploration of Outcome Measures Adaptation Across Language Variants. Frontiers in Psychology, 12, 695893. https://doi.org/10.3389/fpsyg.2021.695893. https://www.frontiersin.org/articles/10.3389/fpsyg.2021.695893/full (open access).