However, here for those who may want to amuse themselves, or perhaps even
teach themselves some statistics. Is the material from the two lectures I
give on the biennial Guildford revision course for psychiatrists about to sit
their Part II M.R.C.Psych. exams. I'd be amused to receive feedback to: me
*[if your HTML browser doesn't support "mailto:" then
use a mail package and send to: C.Evans@sghms.ac.uk]*

I'll work on Microsoft and the HiJaak people to see if I can improve on the quality of the graphics and will be looking into the feasibility of putting things up using the Adobe Acrobat .PDF format instead of HTML in the near future.

- Differential diagnoses:
- Phobic anxiety
- Overvalued ideas
- Frank delusional ideation

- Treatment
- Exposure and response prevention

Sadly this file has converted very badly to GIF format. It shows the relationships between observations (single woman on left!), samples (gaggle in middle) made up of observations and, at least in theory,
drawn at random from population (e.g. British adults!)

- Samples give you data
- Descriptive sample statistics summarise those data... e.g.
...
- rate (for binary variables)
- mean
- median, quartiles & centiles
- standard deviation & deviance

- Two groups of statistics:
- "central tendency" or "location"
- scatter about that central tendency

- you have a sample parameter (typically a mean or a rate)
- this is a "best guess" estimate of the population parameter
- sample size determines how precisely you estimate the "true" value for the population
- larger sample => greater precision
- confidence intervals indicate precision

(that the s.d. is a description of scatter of observations in
the *sample*, whereas the C.I. is an estimate of the location
of a parameter in the *population *from which the sample
was taken)

n=100 mean = 15.5 s.d. = 5.1 95%CI = 14.5 to 16.5 n=1000 mean = 15.5 s.d. = 5.1 95%CI = 15.2 to 15.8 n=10000 mean = 15.5 s.d. = 5.1 95%CI = 15.4 to 15.6

- 95% confidence interval will span the "true" population value of the parameter in 95% of experiments provided ...
- ... that the assumptions underlying the calculation are correct
*not*possible to say for any 95% C.I. that it contains the "true" population value

n 95% C.I. 10 .0025 to .45 20 .012 to .32 40 .028 to .24 50 .033 to .22 70 .041 to .20 100 .049 to .18 200 .062 to .15 500 .075 to .13 1,000 .082 to .12 5,000 .092 to .11 10,000 .094 to .11 100,000 .098 to .102

This should show the last set of figures as a plot. The axes have been completely lost in translation but I hope it still gives
a flavour of the narrowing of the 95% confidence interval as the sample size increases (from Left to Right).

This converted even less satisfactorily than the last two graphics. This shows how the pool of depressed subjects are split into two samples
(by randomisation), their depression scores are noted, then, double-blind, one group get active compound and the other
placebo for a suitable period after which their depression scores are again noted. The dependent variable to be analysed
will be the difference between the two scores so a negative will indicate success: a reduction in depression.

- measure of interest is four week drop in HRSD rating
- equal sized groups (not paired)
- independent observations
- advantage to active compound: -5
- both groups equal s.d. 7
- 200 such studies simulated for ...
*n*= 10*n*= 100

This has converted so badly it seems scarcely worth the candle. It is supposed to show
two samples each of n=10 one with a group of drops representing the
active compound, the other the placebo where the samples were made on the
assumptions shown in the definition above. This particular
run of the simulation gives rise to the 95% confidence intervals below.

- Active compound: -26.0 to -18.0
- Placebo: -15.0 to -7.0
- Difference: -16.6 to -6.0

This shows a the confidence intervals for a number of runs of the simulation.

Another terrible conversion. It shows the two samples now with n=100
but still with the same population model. This run of the simulation
gives rise to the 95% confidence intervals below.

- Active compound: -20.0 to -17.3
- Placebo: -14.3 to -11.8
- Difference: -7.5 to -3.8

- for rates these assumptions are:
- random sampling from large (infinite) population(s)
- independent observations
- constant probabilities within each group

- for means these assumptions are (for pt.II):
- random sampling from large (infinite) populaton(s)
- independent observations
- Normal or Gaussian distributions
- equal population variance within each group

(for C.I. of difference between independent groups)

- delusions are falsely derived, unshakeable beliefs
- the content of an erotic transference (and erotomania) is apparently libidinal, sexual love
- ... but is often the cover for envy and hatred
- psychiatry has just such a relationship with statistical tests
and the
*p*value - ... not that they're not great in a reality oriented relationship!

- this branch of statistics is:
- clear
- rigorous
- objective
- potentially enlightening
- dichotomous

- it is not:
- a means of transmuting uncertainty to certainty
- immune to abuse
- complete and balanced (as it's usually (ab)used)

- "null" hypothesis
- random sampling from large (infinite) population
- independent observations
- other assumptions if dealing with means not rates

... the result is declared "significant"

- the pre chosen level is the
*alpha*or "type I error rate"

... i.e. the probability of deciding that an effect is "significant"
*given that the null hypothesis were true for the population*

- this doesn't mean:
- it's clinically interesting
- it's managerially interesting
- it's theoretically or philosophically interesting
- you can say anything for certain about your next patient

- however it is:
- a completely logical way to decide significance in terms of likelihoods
- and other ways of doing this are even more complex

and contentious!

- Remember "statisticians do it backwards

(but beautifully)" - therefore the null hypothesis is (depending on the test) that:
- the population mean is zero
- the population change is zero
- the difference between the population means is zero
- there is no association (no correlation) in the population

- it is a model of the population not the sample
- it's part of a set of assumptions forming a model that can be tested against the sample data

- The alternative hypothesis is the logical complement to the null hypothesis
- ... that there is
- a population mean other than zero
- population change other than zero
- a difference between population means other than zero
- an association (correlation) in the population that is other than zero

- whereas there is only one
*quantitative*null hypothesis .... - ... there are many possible quantitative alternative hypotheses contained within the one logical alternative hypothesis

- There is one type I error rate for a study
- but there are an infinite no. of possible type II error rates
- ... the risk of deciding "non-significant"
- ... when the null hypothesis was
*not*true - ... for each of the infinite number of true population effects

Given a small study and/or a weak "true" population
effect the type II error rate is often very high (.9 for many
published "NS"** results)**

- i.e. a non-significant result does not prove there is no population effect

- given a certain "true" population effect
- and all the other assumptions of the test
- that you would have found a significant result
- given your sample size(s)

- (1 - type II error rate)
- or (1 -
*beta)*

Historically two very different approaches to the problem and there was much acrimonious argument between protagonists of the two approaches

- but generally mathematically complementary
- as a 95% confidence interval that doesn't embrace a zero effect is equivalent to a significant effect
- ... when the assumptions are the same (true for most tests on means)

- One study group (and one measure)
- how many?
- how much?

- One study group (and two measures)
- how much associated/correlated?

- Differences between groups (one measure)
- how many?
- how much?

- The remainder: "multivariate statistics"

- Prevalence
- always think of as a ratio:

__number affected__number at risk

- point prevalence
- period prevalence

- always think of as a ratio:
- Incidence
- always a ratio but now of onsets

__number of new cases (or relapses)__number at risk

- always a ratio but now of onsets

- nominal scaling
- ordinal scaling
- interval scaling
- ratio scaling

(N.B. Binary measurements are not easy to classify)

- sensitivity
- specificity

- positive predictive value
- negative predictive value

- reliability
- validity

"True" status | Score on test | ||
---|---|---|---|

-ve | +ve | total | |

non-case | a | b | a+b |

case | c | d | c+d |

total | a+c | b+d | a+b+c+d = n |

a = true negatives b = false positives c = false negatives d = true positives sensitivity = d/(c+d) specificity = a/(a+b) PPV = d/(b+d) NPV = a/(a+c)

PPV = "Positive predictive value", probability a positive test result will be a case NPV = "Negative predictive value", probability a negative test result will be a non-case

- stability across time (if the latent variable is supposed
to be stable)

- "test-retest reliability" - stability across observers

- "inter-rater reliability" - stability across subunits of the measure

- "internal reliability"

- Evidence of criterion related validity
- concurrent validity
- predictive validity

- Evidence of content validity
- assumes you have definite agreement about content
- includes "face validity"

- Evidence of construct validity
- is it located within a theoretical model ...
- ... aspects of which can be measured ...
- ... and do the measurements fit the theoretical model?

- If the null hypothesis is true and you do:
- one test: type I error rate is a .05
- two tests: rate is 1 - (1- a)2 .098
- three tests: rate is 1 - (1- a)3 .143
- five tests: rate is 1 - (1- a)5 .226
- ten tests: rate is 1 - (1- a)10 .401
- twenty tests: rate is 1 - (1- a)20 .642
- thirty tests: rate is 1 - (1- a)30 .785
- fifty tests: rate is 1 - (1- a)50 .923

- ... so just don't tell the reader about the the non-significant tests you conducted!

- those involving distributional assumptions, a.k.a. "parametric" tests and based on Normal or Gaussian distributions
- those not involving distributional assumptions, a.k.a. "nonparametric", applying automatically to counts & rates, or based on converting scores to ranks

- Parametric tests give better statistical power than non-parametric for similar sample sizes
- but non-parametric tests not dependent on distributional assumptions
- choice between the two best taken after consultation with expert statistician

- If it's a count, apply
*C*hi squared test - paired tests:
- if it's roughly Gaussian, apply paired t-test
- if not, Wilcoxon test

- unpaired tests of difference between two groups
- Gaussian and similar variances in each group (or similar sample sizes) - unpaired t-test (or ANOVA)
- if not, Mann-Whitney test

- tests of differences between >2 groups
- Gaussian ... - ANOVA
- if not, Kruskal-Wallis test

- if it's an association or correlation
- between counts -
*C*hi squared and related phi - ... or
*K*appa (particularly for looking at inter-rater agreement) - between Gaussian variables (similar variances not necessary) - Pearson correlation coefficient

- between counts -

between non-Gaussian variables - Spearman or Kendall correlation coefficients (latter better if many ties which is often the case on short ordinal ratings)

- There are two main distributional assumptions:
- Gaussian shape
- with the same variance in each group (for >1 group)

parametric tests are often robust (i.e. continue to give about the right type I and II error rates) for distributions that are only roughly Gaussian

- many things can be transformed to "roughly" Gaussian

e.g. by taking logs (pH is one example)

- parametric test are also often robust to differing variances provided sample sizes in the groups are similar

The one fairly unproblematical one is internal reliability (coefficient alpha): the proportion of common variance in a set of items

- Others are:
- factor analysis
- cluster analysis
- multivariate ANOVA (MANOVA)

- problems are:
- need large sample sizes
- not as robust to distribution as univariate statistics
- they're complex, easy to "blind with science"!