Robustness & robust tests

All statistical “tests” and analyses beyond the very simplest descriptive methods are really looking at the fit between models of what might be happening and what we see in our data. One of the most basic and famous statistical tests (see Null Hypothesis Significance Testing: NHST), the independent groups t-test tells us how unlikely it is that we would have seen differences between two sets of numbers had they been collected by random sampling from an infinitely large population in which the means values for the two groups were exactly the same. It is testing fit to a null hypothesis (that there is no mean difference in the population). It turns on some other aspects of the model, usually called “assumptions”: random sampling, Gaussian distributions of population scores, equal variances of the scores in the two population groups and independence of observations.

Generally, of course, real life isn’t a perfect fit to these assumptions in the model. The perhaps unfortunate term “violated” is often used which is a bit dramatic and does seem to lead to people talking as if there real life is doing something wrong, is violating something and as if good researchers would research real life that didn’t violate things.

Robustness is about how much misfit between real life and the model leads to minor and unimportantly innacurate findings from a test, non-robustness is when the misfit can lead to markedly misleading decisions based on the analyses.

Details #

Non-independence of observations #

Let’s get one thing out of the way: non-independence of observations can never be assumed to have little affect on findings. Some data will fit multi-level models MLMs) that can handle some non-independence, other data, particularly when non-independence is within small groups (therapy sized groups!) may not really fit MLMs.

Non-parametric tests #

Now another issue relating to robustness: parametric versus non-parametric tests. Non-parametric tests (the Mann-Whitney or Wilcoxon, depending on whose nomenclature you follow) is the test equivalent to the independent groups t-test that doesn’t have the assumption of Gaussian population distributions. That’s not about robustness, it’s simply about changing the method to avoid the assumption altogether. More on this in parametric tests.

Robust tests/statistics #

And now another terminological thing: “robust tests/statistics”. I think these terms are now mainly applied to bodges to the data or test that were designed to make parametric approaches less affected, i.e. more robust, to deviations from Gaussian distributions, an example is “Winsorising” data: replacing values in the dataset that are “outliers”, beyond a certain distance from the mean say, with that distance. (Hm, perhaps I need an entry with examples, for Winsorising but for now, go to if you’re intrigued.) My own take is that now we have bootstrapping as a way to approach differences in means these methods have become a historical curiosity.

Transforming data #

Winsorizing was a particular, and non-monotonic (not preserving the ranks of the observations) of “transforming”. This is is another way to make analysis robust to non-Gaussian distribution in the data: transform it to make not so different from Gaussian. This is still used for some situations though again, bootstrapping has perhaps much reduced its necessity and utility.

Homoscedasticity/heteroscedasticity #

Heteroscedasticity is the esoteric sounding name of the situation, common in real life, that one group of values, which may or may not differ from another in its mean, does definitely differ from the other in its variance. That doesn’t fit the assumptions of the t-test and the ANOVA family of tests. There are extensions of the maths of some of these tests that render them robust to heteroscedasticity.

Sensitivity analyses #

This is a key idea so it’s go its own entry!

Try also #

See all the links above!

Chapters #

Touched on in Chapter 5

Online resources #

None yet.

Dates #

First created 28.viii.23.

Powered by BetterDocs