This is a huge family of statistical analyses within the NHST (Null Hypothesis Significance Test paradigm). The name comes from the shared fundamental that these methods depend on looking at the variance in the data within a wide set of models. The most basic model is that of analysing the data from two distinct sets of data to see whether the difference between the means of the two sets is so large, given the sizes of the two datasets and given the variances of the values in each dataset are such that it is unlikely that the difference arose from random sampling of the two datasets from infinitely large populations where the two population means are identical. (This is exactly the same null hypothesis test as in the two groups or unpaired t-test, the maths is different but it will always give the same p value as the t-test will.)
The key difference from the t-test is this simple two group comparison is just the simplest ANOVA, the ANOVA method can be used to test whether the means from more than two datasets are from populations in which the means are the same.
Details #
All ANOVA tests, regardless of the complexity or otherwise of the model give an F value with two degrees of freedom so you will see something like “F(1, 118) = 14.76, p = .0002” which, for this arbitrary example says that the null hypothesis should be rejected as that p value is small: 2 in 10,000. (The null hypothesis is, as usual, that the data came from a population in which there was no difference between the means of the groups.)
Or you might see for another set of data “F(1, 158) = .04, p = .842” where you wouldn’t reject the null hypothesis.
ANOVA models aren’t restricted to between group comparison but also include analyses of repeated data (e.g. “should we reject the null hypothesis given the mean scores of the same 40 clients at each of the five sessions they attended?”).
They also extend to more mixed models e.g. “should we reject the null hypothesis of no population differences given the mean scores of 40 clients at each of the five sessions they attended where 25 clients saw one therapist and 15 another?” That is a so-called mixed model as it has a predictor that is repeated (five sessions) and one that is between groups (between the therapists).
The key assumptions which, if they apply to the real world data collection, make the p values pretty robust are:
- Observations within groups are independent of each other (i.e. there is nothing making one client’s score influence that of another client other than them coming from the different groups). This may or may not be true for typical therapy data.
- The data arose by random sampling from infinite populations. (Essentially never true for typical therapy data.)
- The variances within any groups in the populations are equal. (Probably rarely true for therapy data but there are “corrections” to the basic ANOVA maths that can soften this requirement: see Welch’s test and Satterthwaite’s correction.)
- The population distributions of values are Gaussian. (Never really true for therapy data but how much impact the deviations from Gaussian distributions will have can be severe or not!)
You should see some appraisal of, or at least acknowledgement of, the likely impacts of the likely deviations from these requirements in any report of an ANOVA. I very rarely do!
Try also #
- Gaussian (“Normal”) distribution
- Independence of observations
- Mean (arithmetic mean)
- NHST paradigm
- Robustness of statistical tests
- Satterthwaite’s correction
- t-test (“Student’s t-test”)
- Welch’s test/correction
Chapters #
Not covered in the OMbook.
Online resources #
None likely from me I think.
Dates #
First created 25.ii.26.