Internal vs. external analyses

Most thinking about change/outcome data is of “internal” analyses: analyses within one local dataset and this is generally presented without any “external” comparisons. This is a pity!

Details #

There are many reasons why this is a pity.

Obscuring selection bias and comparability/generalisability #

No sample or dataset is a truly random, unbiased, representative, universally generalisable guide to the wider world! There is a sad tendency in much research publication either to simply ignore this, or else to have some token acknowledgement under “limitations” in the discussion section of the paper. This is regrettable as it would be possible to build a very different approach if every report made at least some attempt to say how similar, or different, to other reports the specific dataset is. This should give some summary of the socio-demographic characteristics of the clients in any report, equally it should summarise the therapy variables: durations, termination types and rates of cancellations and DNAs are crucial and not hard to collect and to collate against other reports.

Restricts learning about rarer subpopulations #

As we noted for non-binary gender, there are many clients who fall into relative uncommon categories: clients with particular disabilities, whose first language is not the dominant local one, clients with unusual social or employment situations. These clients tend to fall into very small cells in most internal data reports but if most therapists and services were contributing (well anonymised or pseudonymised) data to external, aggregating databases then it becomes possible to share information about these small groups. Technically that’s about analyses that are “internal” to an “external” dataset and it takes us toward the issue of “big data”.

Inhibits collaboration #

We believe that if there were an expectation that all data analyses would have some focus on external comparisons it would promote much more collaboration between therapists and between services to contribute their data to collaborative datasets.

Creates a unhelpful dichotomy between “ordinary” and “big” data #

See “big data”!

Try also #

Cell size
Referential data
“Big data”
Statistical power
Precision
Estimation
Gender

Chapters #

Mainly in Chapters 7 and 8.

Dates #

Created 14/11/21.

want to suggest changes or got questions?

Updated on 17th November 2021