You may see this written as “chi-squared”, “chi square”, “chi-square” or “chisquared” or as using the rather beautiful upper case Greek letter chi with the superscript to show it has been squared: , or you may see it as where “d.f.” will have been replaced by a number, more on that below.
Here I am talking about the simple chi squared test of count data, you may also see the term and any of those names used in relation to getting p values for complex statistical models. That’s a very different use of the underlying theory and I think it will be obvious which you are seeing.
Details #
This is one of the paradigmatic “tests” and it works with counts so it is a “parametric” test. That’s to say it’s not making any assumptions about the distribution of the variable or variables of interest in the population. It is a null-hypothesis significance test (NHST) and that is to say that it’s asking the question “how improbable is it that we would have found the numbers we did in our sample, numbers as far from the null hypothesis about the population, that we chose, at some pre-agreed level (usually probability less than .05, one in twenty) that the sample came from such a population.
That’s the logic and I have spelled it out because so often that logic is overlooked and textbooks tend to jump into the maths which exacerbates the problem. In the 21st Century of course it’s great if you are a statistician that you should understand and even be able to compute by hand, the maths but for the rest of us that is completely unnecessary: a computer will do it for us in a millisecond for most samples.
So what is actually being tested. For our purposes I think it’s sensible to consider just two chi squared tests: the one-way and the two-way.
The one-way chi squared test #
This tests how likely it is that a set of counts across some situations or other variables, are so different from each other that it’s unlikely that they came from a population where the rates are constant. Consider quarterly referral counts from five workers in a university health centre each seeing the same number of students per quarter. Are they making similar numbers of referrals per quarter or are the differences such that that null model is implausible.
| Worker | A | B | C | D | E |
| Referrals | 5 | 5 | 5 | 5 | 5 |
Hm, not much difference there, for what it’s worth the chi squared value is zero, the degrees of freedom (we’ll come back to that later) are 4 and the p value is 1.0: it is extremely plausible that these numbers came from a situation in which the five referrers have the same long term referral rate. What about this next?
| Worker | A | B | C | D | E |
| Referrals | 50 | 50 | 50 | 50 | 50 |
The chi squared test results are exactly the same: the chi squared value is zero, the degrees of freedom are 4 and the p value is 1.0. That tells you that the test is only interested in the proportions, clearly if the rates were the same across the referrers but did fluctuate across time then the probability of each referring 50 clients by chance alone would be low if they were referring say 10% of the students they saw, all the test is interested in is the null model of equal rates.
But what about this?
| Worker | A | B | C | D | E |
| Referrals | 3 | 5 | 7 | 4 | 6 |
That might look like quite a difference across the five referrers but the chi squared is 2.0, the d.f. (degrees of freedom) are still 4 and the p value is .74. The plausibility that these numbers arose by random fluctuations despite the five referrers, in the long run, having the same referral rates is still way above the p < .05 at which we (using our pre-agreed risk of a “false positive” rate of one in twenty) reject the model of equal referral rates.
What about this?
| Worker | A | B | C | D | E |
| Referrals | 30 | 50 | 70 | 40 | 60 |
Now our chi squared value is 20.0, d.f., as ever, 4 but the p value has dropped to .0005, a five in ten thousand probability of seeing numbers as different from equal rates, or even more so. We clearly reject the null model.
The two-way chi squared test #
In the above we only had counts per referrer, suppose we actually knew how many students each saw in that quarter and we had this.
| Referrer | A | B | C | D | E |
| Referrals | 5 | 5 | 5 | 5 | 5 |
| Students seen | 50 | 50 | 50 | 50 | 50 |
Here the test results are the chi squared value is zero, the degrees of freedom are 4 and the p value is 1.0: exactly the same as in the first one-way test above. That’s because the data through exactly the same light on the null model of equal rates but now the test is of equal rates, not equal numbers. However, for this highly artifical example, as the numbers of students each referrer saw were the same that’s the same issue. But if we go to this next we see the difference.
| Referrer | A | B | C | D | E |
| Referrals | 5 | 5 | 5 | 5 | 5 |
| Students seen | 10 | 80 | 50 | 40 | 60 |
Here the test results are chi squared = 11.7, d.f. = 4, p = .02. Despite the equal numbers being referred we can see that the referral rates are very different ranging from 6.25% to 50% and as the two-way test is testing a null model of equal referral rates and with p = .02 we reject the model that in the long term the referrers will have the same referral rates.
One little worry: for that last test, unlike the others, R spits out the warning to me:
Warning message: In chisq.test(matrix(c(rep(5, 5), c(10, 80, 50, 40, 60)), nrow = 2, : Chi-squared approximation may be incorrect
I will come back to that!
What are degrees of freedom (d.f.)? #
They are pretty much what the term suggests: I find an easy way to think of it is about how much traction the shape of your data gives you to test the null model. This is about the shape of the data, not the total n, as ever, the larger the total n, the more statistical power. Here, the larger the number of cells, the more traction of a different sort: in the one-way test we had five cells, assuming, as we have to in order to have a null model, that the referral rates/counts are equal loses us one degree of freedom so we are left with four degrees of freedom. For the two-way test the d.f. is always the number of rows minus one, multiplied by the number of columns minus one, so here (5 – 1) * (2 – 1) = 4. If you see a chi squared value quoted without a d.f. someone is cutting corners.
Cautions #
As with most NHST paradigm tests the robustness of the chi squared test rests on the idea of sampling from an infinite population. Here that becomes the idea that our referrer’s rates will remain constant over time, infinite time in theory and only fluctuate from quarter to quarter (say) by random variation. It’s not as daft as it sounds spelled out, well it is, but it’s all we can do to model the referral process if we don’t have more data.
Another assumption underpinning most NHST is that each observation is independent of any other, i.e. one student being referred or not doesn’t make any other any more or less likely to be referred. However, if the referrers work with different schools in the university, or different halls of residence, or by interests in particular student problems then this goes out the window and then the differences we see, if significant, may be about the differences beween the students each sees, not about their referral rates. (There are more esoteric aspects of this assumption but we can probably safely ignore them here.)
That warning. That comes up when “expected cell sizes” are small so you won’t see it for one-way tests nor for two-way tests where rates are such that the numbers expected under the null model are higher (traditionally, greater than five for most of the cells he says sweeping under the carpet a lot of small print popular in traditional statistics texts, particularly ones written for non-statisticians, love). If you are having to be your own statistician and you see this warning, find a statistician or contact me giving me your table of numbers that gave you the warning and I can probably tell you whether it matters or not. Sometimes Fisher’s exact test can help there.
A final caution is that, as for all NHST, this is testing improbabilities and not telling you about the strength of any association/difference. For two-way “2 by 2” tables of data there are simple and sensible confidence intervals (CIs) indexing the strength of the effect, the confidence interval around your observed “odds ratio” is probably the best though you can get the CIs for the difference in proportions too. For bigger tables things are more complex and there are effect size indices and CIs for these but interpreting them does become complex.
Try also #
- Confidence intervals
- Effect size
- Fisher’s exact test
- Independence of observations
- Null hypothesis / null models
- Null hypothesis significance test (NHST) paradigm
- Odds ratios
- Parametric tests
Chapters #
Not covered in the OMbook.
Online resources #
- My Rblog post https://shiny.psyctc.org/apps/Forest_plot_rates/ that gives a forest plot of rates shows chi squared and Fisher test results.
- I will probably put some similar apps up, one for one-way tests and for more general two-way tests.
Dates #
First created 11.i.26, link to Fisher test added 13.i.26.