Also known as Welch’s t-test, or the unequal variances t-test or Welch’s correction. An adaptation of the between groups t-test which makes it more robust if the populations from which the two samples were taken don’t have the same variance (SD). The equation is the same as the Satterthwaite (or Satterthwaite-Welch or Welch-Satterthwaite correction).
Details #
I am not sure much more detail is needed here as we are getting into quite technical bits of stats and helplessly in the Null Hypothesis Significance Testing (NHST) paradigm. However, that’s what you will often seen in reports so here we go.
If you see this in a report then the authors are working in the NHST paradigm. What you will probably see is something like this: t = -0.78878, df = 65.844, p = 0.4331. So, as that p value is greater than .05 and using the NHST paradigm the values for the two datasets’ means, SDs and sizes (n) were not such that the authors would reject the null hypothesis that the population means were the same despite whatever difference between the means was seen in the data. (And you hope that the report notes the other assumptions that make that a sound decision and whether they are likely to be violated in the data and how serious the impact of the violations would be … and yes, you’re very lucky if the report does address those issues!)
The key that this was a Welch’s test not the basic between groups t-test is that the degrees of freedom (d.f.) reported has a non-integer value. That is Welch’s “correction”. In this little simulation I did to get those findings both datasets were size 50, both came from populations with mean zero but one population had SD 1 but the other had SD 3. The simple t-test would have shown: t = -0.78878, df = 98, p = 0.4321. That has the same t value as the Welch’s test (because Welch’s test is really just a correction to how we get the p value, it starts from the same t value). However, it has a very different and integer d.f. of 98 (in the basic between groups t-test the d.f. will always be two fewer than the total n) and it has a slightly different p value. Using the Welch’s t-test the authors used a test of a null model that didn’t assume equal population variances, that is robust to the populations having rather different variances. And “robust” means it will give p values such that 5%, .05, of Welch’s tests will give a p < .05, i.e. a false positive, even if there is no difference between the population means even if the population variances are not the same. The simple t-test could not be trusted to do that.
Welch’s correction (as I prefer to think of it) applies equally to computing parametric confidence intervals for the difference between the means. Here the two means from the samples/datasets were -0.013 and 0.310 and the 95% CI using Welch’s correction was from -1.142 to 0.495 whereas the 95% CI without the correction was -1.137 to 0.490. The differences are tiny but in other datasets they could be bigger.
Getting into abstruse terminology, Welch’s test is robust to violation of the assumption of homoscedasticity, as that’s the lovely word for equal variances, it is robust to heteroscedasticity, i.e. unequal variances. Links to those terms below!
Try also #
- Confidence intervals
- Degrees of freedom (d.f.)
- Heteroscedasticity
- Homoscedasticity
- Null Hypothesis Significance Testing (NHST) paradigm
- Robustness and robust tests
- Parametric tests
- Satterthwaite correction
- Student’s t-test
Chapters #
Not covered in the OMbook.
Online resources #
None from me currently.
Dates #
First created 10.ii.26, slight expansion 11.ii.26.