A key if slightly slippy idea that comes up in a lot of statistical analyses. I suspect in or after 2026 you can assume that a report gives a d.f. that it’s correct though once or twice, decades ago, I did notice that reported d.f. couldn’t be correct. I am honestly not sure that quantitative sympathetic people in the therapy and psychosocial world really need to know this but I think this next is a fair description if you want a bit more.
Details #
A lot of statistical analyses report a statistic and then some of them “test” it to see if it makes some (null) model of a population unlikely: that’s the Null Hypothesis Significance Test paradigm. So you might have 80 pairs of first/last scores on some measure from clients who went through some therapy with you and you want to know if the hypothesis that in some infinite population of clients, of which your 80 were a random sample, the mean change is zero. That’s your null model, some statistic, classically here the paired t-test might tell you something like t = -10.328, d.f. = 79, p = 2.566e-16, so that p value tells you that you can reject the null model that if you saw an infinite number of clients their mean change would be zero (reassuring!) You can see the d.f. there is 79, one fewer than the number of clients in your data. Why one fewer? Well this is where it gets a bit technical but to get that t value the statistics package uses the observed mean change and the variance and the variance involves the observed mean so you had used up one degree of freedom computing the mean (from which the observations differed: the variance).
Another way of thinking of this, but this really does oversimplify it: if you only had one client’s mean change you have learned nothing really as you have no variance but when you have two mean changes you are just off the ground but still can’t say much, by the time you have 80 mean changes you are 79 better off than you were with just that first, essentially uniformative, mean. (Don’t show that explanation to a theoretical statistician but perhaps a kindly one would agree it’s not a bad way to think your yourself into this!)
Other statistics have different d.f., so a simple between groups t-test of the differences between baseline scores of referrals from one referrer versus those from another referrer where you have 80 referrals has a d.f. of 78 as you are now estimating two separate variances. Some complex models fit to data may have far fewer d.f. than data points as lots of different things are being estimated, for example, a multilevel model of the rates of change by session that is allowing that different clients have different starting scores and different rates of change is losing two degrees of freedom per client. (So complex models with free estimation per client really need lots of clients and lots of point per client to have precise estimates of the many parameters in such models.)
Some models, the simplest original such is the Welch test, may have non-integer d.f. See the entry on the Welch test (and the one on Satterthwaite’s correction) if you want to go into that!
Although d.f. really became well understood and used in the NHST paradigm, they are equally an issue for estimation and confidence intervals.
Try also #
- Confidence intervals
- Estimation
- Null Hypothesis Significance Testing (NHST) paradigm
- t-test
- Satterthwaite’s correction
- Welch’s test
Chapters #
Not covered in the OMbook.
Online resources #
None likely from me.
Dates #
First created 12.ii.26.