Standardising/normalising

These are ways of transforming scores on some measure one effect is to remove the arbitrariness of the scoring. For example, the EAT-26 (26 item Eating Attitudes Test) the items have six levels but are scored 0-0-0-1-2-3 so the score ranges from 0 to 78, by contrast the full Body Shape Questionnaire has 34 items and each item is scored 1 to 6 giving a total score ranging from 34 to 204. Standardising (standardizing?!) aims to replace those scores with ones with the same range … but that brings us to the details, next!

Before we get there, two more things, and I now see these apply for so many topics in our field:

1. WHY do this?! As always, this is crucial! As noted above, one aim is to remove rather arbitrary scales for different measures but there can be more to it than just that. More on this in the details after explaining what these transformations are.
2. As so often, there are lots of names for much the same or exactly the same thing under this general term. Other names that mean much the same are z-transforming (and the rarer t-transforming) and “normalising” (horrible term in my view). “Min-max scaling” is a related idea.

Details #

Possible transforms #

Min-max transform #

The simplest such transformation is the so-called “min-max” rescaling. (I know I’ve seen another name for this but I can’t remember what it was. The way it looks, as if it’s saying minimum minus maximum which is almost the reverse of what’s involved!)

Note added 22.vi.24. I’ve now come across this referred to as “scores as percentages” and I think a good name would be “[proportion | percentage] of maximum possible score range”.

This just replaces the score with where it is between the maximum possible score on the scale and the minimum possible score. The equation is:

$$x^{‘}=\frac{(x – x_{min})}{(x_{max}-x_{min})}$$

This means that for any raw score on any scale a score that is the minimum possible always transforms to zero and any score that is the maximum possible on any scale transforms to 1.

Going back to the example measures, a score of zero on the EAT-26 transforms to zero as that’s the minimum possible score on the scale but for the BSQ, where the minimum possible score is 34, this transforms to zero. Likewise, 78, the maximum on the EAT-26 becomes 1 and 204 on the BSQ also becomes 1. We can see this has removed the rather confusing numbers (but there’s a but … later!)

Standardising #

This is much the commonest standardising I’ve met in our field and it’s also called z-scoring/scores/transforming. The principle is that you convert scores to deviation from the mean but expressed as multiples of the standard deviation. As an equation, if you know the population mean $\mu$ and standard deviation $\sigma$ it’s this:

$$x^{‘}=\frac{(x – \mu)}{\sigma}$$

That assumes that we know those population parameters but we never do. The only mad example I could create was a variable of when in minutes into a session assuming that all sessions were exactly the same length and that events are equally likely to happen at any point in a session. For a 50 minute session that would give us a population mean of 25 and variance of 14.43376 (you probably don’t want to know but the SD of a uniform distribution from zero to 50 is 50/sqrt(12) or as in equation form $(50 – 0)/\sqrt{12}$. This is madness of course, events are not uniformly spread across sessions and even strict Kleinian analysts don’t finish every session on the dot of 50 minutes nor would they have analysands who were always there at the instant the sessions start.

So scores are often standardised using the mean and SD of the scores in the dataset:

$$x^{‘}=\frac{(x – mean(x))}{SD(x)}$$

In that case the mean from the dataset must become zero and the SD of the standardised dataset will have an SD of 1. That’s fine if the standardised data are only being used within the study but very often the point of standardisation is for comparison across data sets. In that case standardisation is generally done using referential values for the mean and SD:

$$x^{‘}=\frac{(x – mean_{referential})}{SD_{referential}}$$

In the past measures were often published with tables that used that equation to map from raw scores to the standardised scores. This brings us to …

Why standardise? #

This, like so much psychometric methodology/jiggery-pokery, is the usually ignored nub of the matter. There are broadly two sets of reasons: internal to the particular study/dataset, and external, i.e. helping compare findings from the dataset with findings from other work.

The main internal reasons have roughly three subcategories: (1) when feeding scores into explorations of their relative impact on other things, (2) to make up composite scores and, less often, (3) to get variables which will behave better in sophisticated analyses. In the first case standardising removes the greater variance that a score like the BSQ has against that of the EAT-26. Having said that, many of the sorts of analyses like multiple regression may do, or have built in options to do some standardisation without formally transforming scores. The second is similar but applies when the scores to be standardised are “dependent” variables not “predictor” variables. Say you want a composite score both of the eating disordered attitudes and behaviours that the EAT-26 covers and you want the body image appraisal the BSQ covers but you want to weight them both equally then it’s sensible to standardise them both though arguably there are better ways to do this (and don’t do it without looking at the correlations between the two scores in your dataset). The final “internal” reason to standardise mostly arises in multi-level modelling (MLM) and I think only applies when using the score as a predictor. Here the seriously complicated computations inside the MLM can be unstable if distributions are markedly asymmetrical or skew and standardising a variable may allow the analysis to run where otherwise it might not. Having said that, usually all that is needed is just centring the values not full standardising: i.e. subtracting their mean from all of them so the mean of the transformed variable becomes zero but not dividing by their SD.

The external reasons are about comparing scores, individual and/or aggregated, to other data for the same measure. Transformed scores drawing on referential means and SDs are immediately using a referential mapping so a transformed score of zero is the same raw score that was the mean in the referential data. There’s a strong, and continuing (I think), tradition in educational statistics particularly in the USA I think to use standardisation in this way to map scores on different attainment/ability tests to national data. There’s are dangers with this approach that are particularly severe in MH/WB/therapy change data that probably are less serious for educational test data. I see these as:
Overlooking the complexities of the choice of referential data. Should standardisation use means and SDs from help-seeking or non-help-seeking samples? The means from those two groups should be pretty different and probably the SDs will be too. What is “referential” here?
There’s a minor issue about how accurately the sample data that gave the referential mean and SD estimate the population values. To be really reliable, accurate estimates (if we have answered that first issue) the sample data should be representative and the sample size should be large.
The method can rather hide the tricky question of whether the measure is behaving similarly for the participants in the new dataset as it did in the data the created the referential mean and SD. At minimum we should be told the internal reliability in the new data (with its 95% confidence interval) and that compared with the value from the referential data.
There’s a danger that the use of the SD to create the scaling is taken to indicate that we are looking at data with a Gaussian distribution so that we can take a transformed value of 2.0 (strictly 1.96) as the top 97.5% centile of the population distribution. This is frankly nonsense as it’s highly unlikely that the distribution really is Gaussian.

standardisation in context #

This is small print but may be useful. Transforming scores by these methods of “standardising” is, to mathematicians, just a subset of a potentially infinite number of transformations of one set of numbers to another and the standardising methods above are “linear transforms”. Coming back from maths to our area, standardising is related to “Transforming data/variable(s)” where the aim is not to get from one score range to another but to convert the shape of a distribution of scores from some shape to another, usually from a clearly non-Gaussian distribution to one closer to Gaussian. That requires “non-linear transforms” as linear transforms won’t change the shape of a distribution, just its central location and dispersion.

Summary #

Standardising data can be useful but only if we think about the reason(s) it has been done and watch out for assuming that it is telling us more than it is, generally that will mean watching out that the referential datasets were large, seem likely to be informative for comparison with the data in the transformed data (i.e. that the participants behind the data in both datasets seem broadly comparable), that the internal reliabilities of the scores in both datasets are comparable and, finally, make sure no assumptions mapping from transformed scores to percentiles are made. (If the people creating the referential dataset and hence the standardisation mean and SD really intended to give us percentile mapping they should have given us the percentiles of the distribution of scores in their data, not just the mean and SD!)

Try also #

Centring/centering
Composite scores/variables
Gaussian (“Normal”) distribution
Percentiles: see quantiles!
Population
Quantiles
Sample
Sampling and sample frame
t-scores
Transforming
z-transform/score

Chapters #

Not mentioned in the book.

Online resources #

None yet.

Dates #

First created 31.iii.24, updated 22.vi.24.

want to suggest changes or got questions?

Updated on 22nd June 2024