In statistical thinking we treat a statistic from a sample both as a description of the sample but also, to the extent that the sample can be thought to be representative of a population then the statistic is an estimate of the population value. The confidence interval tells us about the likely precision of that estimate.
More concretely if we have a dataset of initial questionnaire scores then the mean of those scores is an estimate of the mean of those scores for some population of interest. In classical research the sample really might be a random sample e.g. of small lumps of soil taken at random from across a whole field. In that situation the observed mean, say of available nitrogen, is the best possible guess about the mean available nitrogen for the whole field and the larger the number of samples, the better, more precise, that estimate is going to be.
This paradigm of sampling from a population of interest is neat and clear and turned out, in a series of dramatic development through the 20th Century (but generally drawing on maths going back sometimes much earlier) to have effectively created the modern discipline of statistics. Through the second half of the 20th Century and into the 21st this was accelerated by the development of computers and the dramatic rate at which computer power grew and also became cheaply available. Arguably, those changes rocketed ahead of any general recognition that real world data, and spectacularly so in routine change/outcome exploration, never really fits that sampling model. This doesn’t make estimation meaningless for us but it does mean we should be cautious about assuming that findings based on this paradigm as having automatic generalisability, precision and freedom from bias. Sadly it has none of those but it remains invaluable to move from describing our data to tentative ideas about generaliability.
Try also … #
Nothing here yet!