Modelling the CTT RCI (Reliable Change Index)

Psychometrics Change scores Simulation

I am increasingly convinced that the way the RCI is used is suspect.

Chris Evans https://www.psyctc.org/R_blog/ (PSYCTC.org)https://www.psyctc.org/psyctc/
2025-05-27
Show code
### this is just to create a sensible graphic for this post
### otherwise the first huge facetted histogram comes up
library(latex2exp)
### create limits for the plot area
tibble(x = c(0, 0, 1, 1),
       y = c(.2, .3, .3, .2)) -> tibBox
### create equation
valLabel <- TeX("$RCI = 1.96 * SD_{1} * \\sqrt{2} * \\sqrt{1 - reliability}$")
### plot it
ggplot(data = tibBox,
       aes(x = x, y = y)) +
  annotate("text",
           x = 0,
           y = .25,
           label = valLabel,
           parse = TRUE,
           size = 9) +
  coord_cartesian(ylim = c(.2, .3)) +
  ### remove all distractions
  theme_classic() +
  theme(axis.text = element_blank(),
        axis.title = element_blank(),
        axis.line = element_blank(),
        axis.ticks = element_blank())

Show code
### this is just the code that creates the "copy to clipboard" function in the code blocks
htmltools::tagList(
  xaringanExtra::use_clipboard(
    button_text = "<i class=\"fa fa-clone fa-2x\" style=\"color: #301e64\"></i>",
    success_text = "<i class=\"fa fa-check fa-2x\" style=\"color: #90BE6D\"></i>",
    error_text = "<i class=\"fa fa-times fa-2x\" style=\"color: #F94144\"></i>"
  ),
  rmarkdown::html_dependency_font_awesome()
)

Introduction

Over many years now I have felt that the way we use the RCI isn’t congruent with its logic. My concern is that it is mostly treated as if it were a referential value which generally means that such values are estimates of a population value. Such values of course should have some exploration of the precision of their estimation through a confidence interval around the estimate and some discussion of likely biases: neither of which seem to happen for the RCI. However, that’s certainly not unique to the RCI!

Many comments on the original paper proposing the RCI and CSC (Jacobson, N. S., Follette, W. C., & Revenstorf, D. (1984). Psychotherapy outcome research: Methods for reporting variability and evaluating clinical significance. Behavior Therapy, 15, 336–352) have focused on the CSC rather than the RCI and on the very real challenges of choosing what are appropriate samples and populations for the “clinical” and “non-clinical” used to get the means and SDs needed to compute a CSC. The original response from Christensen & Mendoza (Christensen, L., & Mendoza, J. L. (1986). A method of assessing change in a single subject: An alteration of the RC index. Behavior Therapy, 17, 305–308.) added the missing square root of two to the equation for the RCI but didn’t comment on whether the two values that determine the RCI, the baseline standard deviation of the scores and the reliability of the measure were to be seen as referential. From my skimpy following of more recent comments seem to be about regression to the mean and other subtleties not what I see as the fundamental issue: that the RCI isn’t an estimate of a population value.

Back in another century I co-authored paper about the RCI & CSC (Evans, C., Margison, F., & Barkham, M. (1998). The contribution of reliable and clinically significant change methods to evidence-based mental health. Evidence Based Mental Health, 1, 70–72. https://doi.org/10.1136/ebmh.1.3.70, Open Access at https://mentalhealth.bmj.com/content/1/3/70) that became quite popular. In it we did comment, I think fairly sensibly, on several issues with the RCSC but we didn’t comment on this issue about the RCI that has grown in importance in my mind since then.

Why do I say that the RCI is not an estimate of a population value? Surely any sample statistic can be regarded as such an estimate? Here’s my logic.

The formula for the RCI starts with the formula for the standard error of a difference between two measurements if there is no change in the true scores and hence any change in the observed scores is only down to unreliability of measurement. That SE is:

\[SE_{diff} = SD_{1}\sqrt{2}\sqrt{1 - reliability}\]

From that, assuming we want the RCI to be such that 95% of such meaningless change would be smaller than the RCI, we have the formula for the RCI:

\[RCI = 1.96 * SD_{1}\sqrt{2}\sqrt{1 - reliability}\]

That 1.96 is the standard Gaussian distribution value such that 95% of a cumulative Gaussian are smaller than it.

My concern is that this means that the RCI isn’t a referential value, it’s an algorithm, a heuristic. The reliability can sensibly be regarded as referential and can be given a confidence interval (CI) around it. However, the SD in that equation is a sample/dataset value, it’s not an estimate of a population value. The RCI tells you that, if the assumptions of Classical Test Theory (CTT) apply, i.e. that you have two observations each per person, that the people are independent of one another (“independence of observations”) and if the true and error variances in the population are Gaussian in distribution then in any situation in which there is no true score change, 95% of the absolute change values, i.e. change up or down as as simple number, not a signed number, will be lower than the RCI. That’s it: the SD is what anchors that assertion in the dataset in question and allows us to see if any change value is “reliable improvement”, “reliable deterioration” or “no reliable change”. It also allows us to see if the proportions of those categories in our dataset differ markedly from the 2.5%, 2.5% and 95% that we would expect if that model applies.

That makes using a referential RCI where SD may be different from that of the dataset you have in front of you incongruent with the logic of the RCI.

That’s the logic and to me it’s irrefutable. What follows explores the impact of this by:

That last is probably the best test of my concern.

Create the population model data

I have opted to simulate a measure with reliability of .8. That means that the population covariance matrix of interest is this.

Show code
### set the reliability you want to simulate
valReliability <- .8
### define the means, (all zero)
vecMeans <- rep(0,3)
### start to design the covariance matrix
covMat <- matrix(rep(0, 3^2),
                 nrow = 3,
                 byrow = TRUE)
### finish the covariance matrix by putting in the variances
diag(covMat) <- c(valReliability, 
                  1 - valReliability, 
                  1 - valReliability)
### put in column names to make it easy to convert to a tibble
colnames(covMat) <- c("True", "Err1", "Err2")
covMat %>%
  as_tibble() %>%
  flextable() %>%
  colformat_double(digits = 3)

True

Err1

Err2

0.800

0.000

0.000

0.000

0.200

0.000

0.000

0.000

0.200

Show code
### set population size for next text and code blocks
populationN <- 20000

The three variables are the true score (which doesn’t change) and the error components. All three are independent of each other hence the off-diagonal covariances are all zero and the variances of the true score of .8 and of the error components of .2 fix the reliability at .8.

This next block of code creates a large (n = 2^{4}) , but finite, population with three variables: True, Err1 and True2. They each have means set to zero, Gaussian distributions and a covariance matrix exactly matching that desired one. (Hence this is a finite population not a sample, if it were a sample the covariances wouldn’t exactly match the desired matrix owing to the randomness of the sampling.)

With those simulated variables I could compute the observed scores and the change values. Here are the first ten observations.

Show code
set.seed(12345) # to get replicable values
### use MASS::mvrnorm() to generate a sample with

MASS::mvrnorm(populationN, vecMeans, covMat, empirical = TRUE) -> tmpMat

### name the columns/variables so we can get them into a tibble
colnames(tmpMat) <- c("True", "Err1", "Err2")
tmpMat %>%
  as_tibble() %>%
  ### now generate the observed scores
  mutate(Obs1 = True + Err1,
         Obs2 = True + Err2,
         Change = Obs2 - Obs1,
         ID = row_number()) -> tibSimDat

tibSimDat %>%
  filter(row_number() < 11) %>%
  flextable() %>%
  colformat_double(digits = 3)

True

Err1

Err2

Obs1

Obs2

Change

ID

0.889

-0.471

-0.026

0.418

0.863

0.445

1

-1.199

-0.206

-0.189

-1.405

-1.388

0.017

2

-0.393

0.061

-0.185

-0.332

-0.578

-0.246

3

0.065

0.136

-0.227

0.200

-0.163

-0.363

4

-0.532

-0.201

0.022

-0.734

-0.511

0.223

5

0.720

0.746

-0.113

1.466

0.607

-0.859

6

-0.785

-0.046

0.359

-0.832

-0.426

0.405

7

-1.203

0.382

0.059

-0.820

-1.144

-0.323

8

0.512

0.026

-0.062

0.538

0.449

-0.089

9

2.082

0.020

-0.167

2.103

1.916

-0.187

10

Here are the means.

Show code
tibSimDat %>% 
  select(-ID) %>%
  summarise(across(everything(), mean)) %>%
  flextable() %>%
  colformat_double(digits = 7)

True

Err1

Err2

Obs1

Obs2

Change

-0.0000000

0.0000000

-0.0000000

-0.0000000

-0.0000000

-0.0000000

OK. All zero to the numeric precision of R on my machine (finite maths means that the values aren’t quite zero, hence the minus signs on some of the means). Now here are the variances.

Show code
tibSimDat %>%
  select(-ID) %>%
  summarise(across(everything(), var)) %>%
  flextable() %>%
  colformat_double(digits = 7)

True

Err1

Err2

Obs1

Obs2

Change

0.8000000

0.2000000

0.2000000

1.0000000

1.0000000

0.4000000

OK. All as it should be. Here is the correlation matrix across the entire population.

Show code
corrr::correlate(tibSimDat,
                 diagonal = 1) -> tmpTib

tmpTib %>%
  corrr::shave() %>%
  flextable() %>%
  colformat_double(digits = 2)

term

True

Err1

Err2

Obs1

Obs2

Change

ID

True

1.00

Err1

0.00

1.00

Err2

0.00

-0.00

1.00

Obs1

0.89

0.45

-0.00

1.00

Obs2

0.89

-0.00

0.45

0.80

1.00

Change

0.00

-0.71

0.71

-0.32

0.32

1.00

ID

0.01

-0.00

-0.00

0.00

0.01

0.00

1.00

Here are the distributions.

Show code
tibSimDat %>%
  pivot_longer(cols = -ID,
               names_to = "Variable") -> tibSimDatLong

tibSimDatLong %>%
  mutate(Variable = ordered(Variable,
                            levels = c("True",
                                       "Err1",
                                       "Err2",
                                       "Obs1",
                                       "Obs2",
                                       "Change"))) -> tibSimDatLong

### get statistics for plots
tibSimDatLong %>%
  group_by(Variable) %>%
  summarise(mean = mean(value),
            median = median(value),
            sd = sd(value)) %>%
  ungroup() -> tmpTibStats

### now plot
ggplot(data = tibSimDatLong,
       aes(x = value)) +
  facet_wrap(facets = vars(Variable),
             ncol = 1,
             scales = "free") +
  geom_histogram(bins = 80) +
  geom_vline(data = tmpTibStats,
             aes(xintercept = mean),
             colour = "green") +
  ggtitle("Histograms of the variables in the population dataset",
          subtitle = "Green reference lines mark the means")
Show code
### for the next text block
### work out the population RCI
tmpTibStats %>%
  filter(Variable == "Obs1") %>% 
  select(sd) %>% 
  pull() -> valBaselineSD

valRCI <- valBaselineSD * 1.96 * sqrt(2) * sqrt(1 - valReliability) 

### also for next text and code blocks
nSamples <- 20
sampleSize <- 2000 # 5000

Drawing samples

OK, the simulation seems OK, let’s use it.

The observed baseline (Obs1) SD is 1 so that gives us an RCI of 1.2396128.

Next I have drawn 20 random samples each of size 2000 from that pseudo population.

Show code
### get the expected counts by multiplying the expected proportions by the sample size
vecExpectedNs <- c(.95, .025, .025) * sampleSize

### now create the samples
tibble(sampleID = 1:nSamples) %>%
  rowwise() %>%
  ### this smples from all the IDs of the rows to determine the values (for those IDs)
  ### that will go into each sample
  ### this has to be done rowwise so you get different selections for each sample
  mutate(IDvals = list(sample(1:populationN, sampleSize, replace = TRUE))) %>%
  ungroup() %>% 
  ### OK now unnest those ID values
  unnest_longer(IDvals) %>%
  ### the rest can be done vectorised, i.e. not rowwise()
  mutate(Baseline = pull(tibSimDat[IDvals, "Obs1"]),
         Occ2 = pull(tibSimDat[IDvals, "Obs2"]),
         Change = pull(tibSimDat[IDvals, "Change"])) %>%
  ### get the RCI categorisations
  mutate(absChange = abs(Change),
         RCIcat3way = case_when(
           absChange < valRCI ~ "No reliable change",
           Change >= valRCI ~ "Reliable deterioration",
           Change <= valRCI ~ "Reliable improvement")) -> tibSamples

This is what we see if if we use that RCI based on the baseline SD from whole (finite) population, i.e. from a perfect estimate of the true SD.

Show code
tibSamples %>%
  tabyl(RCIcat3way) %>%
  ### this just deals with an oddity that janitor::tabyl() says its variable is "percent" but 
  ### actually gives proportions
  mutate(percent = 100 * percent, # convert to percentages
         percent = sprintf("%4.2f", percent), # get to 2 d.p. and then ... add "%"
         percent = str_c(percent, "%")) %>%
  flextable() %>%
  autofit() %>%
  ### as percent is now a character variable it would align left by default hence ...
  align(j = "percent", align = "right")

RCIcat3way

n

percent

No reliable change

38,048

95.12%

Reliable deterioration

1,023

2.56%

Reliable improvement

929

2.32%

This shows the numbers in each RCI category for each sample but using the population RCI. It also shows the chisquared p value testing the observed numbers against the expected values.

Show code
tibSamples %>%
  group_by(sampleID) %>%
  summarise(nNoRelChange = sum(RCIcat3way == "No reliable change"),
            nRelDet = sum(RCIcat3way == "Reliable deterioration"),
            nRelImp = sum(RCIcat3way == "Reliable improvement")) %>%
  rowwise() %>%
  ### run the chisquared test squashing warnings
  mutate(chisqP = suppressWarnings(chisq.test(as.table(rbind(c(nNoRelChange, nRelDet, nRelImp),
                                                              vecExpectedNs)))$p.value)) %>%
  ungroup() -> tmpTib1

tmpTib1 %>%
  flextable() %>%
  colformat_double(digits = 3) %>%
  ### next two lines do nice conditional background colour formatting
  bg(j = "chisqP", i = ~ chisqP >= .05, bg = "green") %>%
  bg(j = "chisqP", i = ~ chisqP < .05, bg = "red")

sampleID

nNoRelChange

nRelDet

nRelImp

chisqP

1

1,914

45

41

0.548

2

1,912

46

42

0.638

3

1,909

51

40

0.565

4

1,913

49

38

0.429

5

1,899

55

46

0.817

6

1,893

60

47

0.602

7

1,899

45

56

0.740

8

1,888

48

64

0.407

9

1,916

44

40

0.458

10

1,906

62

32

0.073

11

1,892

52

56

0.820

12

1,915

35

50

0.258

13

1,905

49

46

0.912

14

1,888

62

50

0.516

15

1,899

53

48

0.938

16

1,902

52

46

0.902

17

1,901

53

46

0.881

18

1,905

54

41

0.591

19

1,894

58

48

0.725

20

1,898

50

52

0.980

Show code
tmpTib1 %>%
  summarise(meanP = mean(chisqP)) %>%
  pull() -> valMeanP

So that shows that all the proportions were as you’d expect given random fluctuations across the samples, and that the chisquared test p value was above .05 for all the samples. The mean p value is 0.64.

What happens if we use the SD of the baseline scores for each sample to compute a per sample RCI? Here are the baseline SDs and the corresponding RCIs per sample just to see how much they vary across the samples.

Show code
tibSamples %>%
  group_by(sampleID) %>%
  summarise(sampleSD = sd(Baseline),
            RCI_PS = sampleSD * 1.96 * sqrt(2) * sqrt(1 - valReliability)) -> tibPerSampleStats

tibPerSampleStats %>%
  flextable() %>%
  autofit() %>%
  colformat_double(digits = 4)

sampleID

sampleSD

RCI_PS

1

0.9854

1.2215

2

1.0125

1.2551

3

1.0214

1.2662

4

1.0005

1.2402

5

0.9919

1.2296

6

1.0001

1.2397

7

1.0253

1.2710

8

1.0133

1.2561

9

1.0083

1.2499

10

1.0029

1.2432

11

0.9883

1.2251

12

0.9910

1.2285

13

1.0070

1.2483

14

1.0147

1.2578

15

0.9829

1.2185

16

1.0081

1.2496

17

0.9873

1.2239

18

0.9936

1.2317

19

0.9993

1.2388

20

1.0011

1.2410

And here are the mean baseline SD and RCI values pooling across the 20 samples.

Show code
tibPerSampleStats %>%
  summarise(meanBaselineSD = mean(sampleSD),
            meanRCI = mean(RCI_PS)) %>%
  flextable() %>%
  autofit() %>%
  colformat_double(digits = 4)

meanBaselineSD

meanRCI

1.0017

1.2418

As you’d expect: pretty close to the population values of 1.0 and 1.2396128

Show code
tibSamples %>% 
  ### pull in the per sample RCI so you can do this
  left_join(tibPerSampleStats, by = "sampleID") %>%
  mutate(RCIcat3wayPS = case_when(
    absChange < RCI_PS ~ "No reliable change",
    Change >= RCI_PS ~ "Reliable deterioration",
    Change <= RCI_PS ~ "Reliable improvement")) -> tibSamplesPS

tibSamplesPS %>%
  tabyl(RCIcat3wayPS) %>%
  mutate(percent = 100 * percent,
         percent = sprintf("%4.2f", percent),
         percent = str_c(percent, "%")) %>%
  flextable() %>%
  autofit() %>%
  align(j = "percent", align = "right")

RCIcat3wayPS

n

percent

No reliable change

38,048

95.12%

Reliable deterioration

1,027

2.57%

Reliable improvement

925

2.31%

So pretty much as you’d expect. What are the proportions and chisquared test p values?

Show code
tibSamplesPS %>%
  group_by(sampleID) %>%
  summarise(nNoRelChangePS = sum(RCIcat3wayPS == "No reliable change"),
            nRelDetPS = sum(RCIcat3wayPS == "Reliable deterioration"),
            nRelImpPS = sum(RCIcat3wayPS == "Reliable improvement")) %>%
  rowwise() %>%
  mutate(chisqPPS = suppressWarnings(chisq.test(as.table(rbind(c(nNoRelChangePS, nRelDetPS, nRelImpPS),
                                                              vecExpectedNs)))$p.value)) %>%
  ungroup() -> tmpTibPS

tmpTibPS %>%
  flextable() %>%
  colformat_double(digits = 3) %>%
  bg(j = "chisqPPS", i = ~ chisqPPS >= .05, bg = "green") %>%
  bg(j = "chisqPPS", i = ~ chisqPPS < .05, bg = "red")

sampleID

nNoRelChangePS

nRelDetPS

nRelImpPS

chisqPPS

1

1,902

53

45

0.839

2

1,918

43

39

0.373

3

1,920

49

31

0.102

4

1,913

49

38

0.429

5

1,891

59

50

0.682

6

1,894

60

46

0.581

7

1,913

37

50

0.370

8

1,894

47

59

0.655

9

1,919

43

38

0.323

10

1,907

62

31

0.056

11

1,884

53

63

0.438

12

1,911

38

51

0.432

13

1,906

49

45

0.868

14

1,894

58

48

0.725

15

1,890

57

53

0.751

16

1,904

50

46

0.918

17

1,894

56

50

0.840

18

1,902

56

42

0.596

19

1,894

58

48

0.725

20

1,898

50

52

0.980

Show code
tmpTibPS %>%
  summarise(meanP = mean(chisqPPS)) %>%
  pull() -> valMeanPPS

So, again as you’d expect, no significant deviations from the expected proportions and the mean p value is: 0.5843, fractionally lower than the 0.64 when using the population RCI across all samples.

Impact truncating baseline scores (creates non-Gaussian distributions)

But what if we censor some of the samples by removing some low and some high baseline scorers and we continue to use the population RCI? This next code block models removing the lowest 25% of baseline scores (perhaps like removing those scoring below the CSC or some other “too low” cut-off in a clinical service) and the highest 10% (referring those clients onward as too high?)

Show code
tibSamples %>%
  summarise(quantiles = list(quantile(Baseline, probs = c(.25, .90)))) %>%
  unnest_wider(quantiles) -> tibQuantiles

tibSamples %>%
  filter(Baseline > tibQuantiles$`25%` & Baseline < tibQuantiles$`90%`) -> tibSamplesCensored

tibSamplesCensored %>%
  group_by(sampleID) %>%
  summarise(SDcensoredPS = sd(Baseline),
            RCIcensoredPS = SDcensoredPS * sqrt(2) * 1.96 * sqrt(1 - valReliability)) %>%
  ungroup() -> tmpTib

tibSamplesCensored %>%
  left_join(tmpTib, by = "sampleID") %>%
  mutate(RCIcat3wayPSCensored = case_when(
  absChange < RCIcensoredPS ~ "No reliable change",
  Change >= RCIcensoredPS ~ "Reliable deterioration",
  Change <= RCIcensoredPS ~ "Reliable improvement")) -> tibSamplesCensored

That gives us overall distribution of baseline scores like this.

Show code
tibSamplesCensored %>%
  summarise(mean = mean(Baseline),
            median = median(Baseline)) -> tmpTibStats

ggplot(data = tibSamplesCensored,
       aes(x = Baseline)) +
  geom_histogram(alpha = .4,
                 bins = 50) +
  geom_vline(xintercept = tmpTibStats$mean,
             colour = "green") +
  geom_vline(xintercept = tmpTibStats$median,
             colour = "blue") +
  geom_vline(xintercept = 0,
             colour = "black",
             linetype = 3) +
  ggtitle("Histogram of baseline scores after truncation",
          subtitle = str_c("Dotted black line is pre-trunctation population mean value",
                           "\nBlue line marks median value after truncation and green marks the mean"))

That shows clearly how the mean has shifted and that the distribution is, of course, no longer Gaussian. The individual samples have these statistics (and density plots).

What about the change scores?

Show code
tibSamplesCensored %>%
  summarise(mean = mean(Change),
            median = median(Change)) -> tmpTibStats

ggplot(data = tibSamplesCensored,
       aes(x = Change)) +
  geom_histogram(alpha = .4) +
  geom_vline(xintercept = tmpTibStats$mean,
             colour = "green") +
  geom_vline(xintercept = tmpTibStats$median,
             colour = "blue") +
  geom_vline(xintercept = 0,
             colour = "black",
             linetype = 3) +
  ggtitle("Histogram of change scores after truncation",
          subtitle = str_c("Dotted black line is pre-trunctation population mean value",
                           "\nBlue line marks median value after truncation and green marks the mean"))

That shows that the change scores still have a distribution close to Gaussian but that its mean has shifted somewhat owing to the asymmetrical truncating.

Show code
tibSamplesCensored %>% 
  group_by(sampleID) %>% 
  summarise(n = n(),
            meanBaseline = mean(Baseline),
            SDBaseline = sd(Baseline),
            meanChange = mean(Change),
            SDChange = sd(Change),
            RCI_PS = first(RCIcensoredPS),
            Density = list(Baseline)) -> tmpTib

tmpTib %>%
  flextable() %>%
  mk_par(j = "Density", 
         value = as_paragraph(
           plot_chunk(value = Density, 
                      type = "dens", 
                      col = "red")
         )) %>%
  colformat_double(digits = 3)

sampleID

n

meanBaseline

SDBaseline

meanChange

SDChange

RCI_PS

Density

1

1,310

0.212

0.525

-0.011

0.600

0.651

2

1,316

0.216

0.534

-0.053

0.599

0.662

3

1,269

0.235

0.540

-0.068

0.597

0.670

4

1,326

0.214

0.514

-0.054

0.591

0.637

5

1,297

0.223

0.535

-0.081

0.596

0.664

6

1,311

0.240

0.532

-0.057

0.602

0.659

7

1,300

0.207

0.519

-0.088

0.618

0.643

8

1,280

0.213

0.529

-0.080

0.628

0.656

9

1,325

0.218

0.536

-0.051

0.591

0.664

10

1,294

0.226

0.522

-0.014

0.591

0.646

11

1,314

0.226

0.522

-0.059

0.625

0.647

12

1,321

0.202

0.507

-0.061

0.592

0.629

13

1,299

0.227

0.524

-0.029

0.588

0.650

14

1,299

0.215

0.521

-0.013

0.626

0.646

15

1,306

0.220

0.530

-0.080

0.589

0.657

16

1,301

0.196

0.530

-0.040

0.603

0.656

17

1,266

0.230

0.522

-0.019

0.595

0.647

18

1,281

0.204

0.511

-0.027

0.603

0.634

19

1,291

0.185

0.521

-0.034

0.607

0.646

20

1,293

0.234

0.534

-0.057

0.599

0.662

That shows how the samples have dropped in size from 2000 (though all fairly close to 1300, i.e.  2000 * .65) as you’d expect. That also shows that of course the mean has gone up though again with some variation across the samples as has the SD and so as a linear function of those SD values, do the RCI values now they are computed per sample. Those little density plots, again as you’d expect, don’t show marked differences between samples but, as for the pooled data, are clearly not Gaussian. The impact of the truncation on the change values is less than on baseline values but as it is an asymmetrical truncation it does shift the means a little as seen in the pooled data above. We can see how the RCI computed on the per sample basis, RCI_PS, varies across the samples.

Now using the fixed population RCI on these samples gives us this.

Show code
tibSamplesCensored %>%
  tabyl(RCIcat3way) %>%
  mutate(percent = 100 * percent,
         percent = sprintf("%4.2f", percent),
         percent = str_c(percent, "%")) %>%
  flextable() %>%
  autofit() %>%
  align(j = "percent", align = "right")

RCIcat3way

n

percent

No reliable change

24,931

95.89%

Reliable deterioration

473

1.82%

Reliable improvement

595

2.29%

So far fewer than 2.5% show reliable improvement or reliable deterioration as you’d expect given that the truncation has removed the tails of the Gaussian distribution.

Per sample RCI categorisation

However, using the per sample baseline SD values to get RCI values gives this.

Show code
tibSamplesCensored %>% 
  tabyl(RCIcat3wayPSCensored) %>%
  mutate(percent = 100 * percent,
         percent = sprintf("%4.2f", percent),
         percent = str_c(percent, "%")) %>%
  flextable() %>%
  autofit() %>%
  align(j = "percent", align = "right")

RCIcat3wayPSCensored

n

percent

No reliable change

18,629

71.65%

Reliable deterioration

3,258

12.53%

Reliable improvement

4,112

15.82%

That shows that the proportions achieving reliable change have jumped to way above 2.5% each way as the Gaussian distribution assumptions of the RCI, built into it from its roots in CTT, have markedly overestimated the correct RCI values because the baseline distribution is now far from Gaussian.

OK that underlines that distributions well off from Gaussian will me that RCI won’t give us the proportions we ask it to.

Keeping Gaussian distribution but changing the SD

Show code
vecMultiplier <- c(.7, 1.42)

tibSamples %>%
  mutate(tmpI = as.integer(1 + sampleID %% 2),
         multiplier = vecMultiplier[tmpI],
         newBaseline = Baseline * multiplier,
         newOcc2 = Occ2 * multiplier,
         newChange = newOcc2 - newBaseline,
         newAbsChange = abs(newChange)) %>%
  mutate(RCIcat3wayMultiplied = case_when(
    newAbsChange < valRCI ~ "No reliable change",
    newChange >= valRCI ~ "Reliable deterioration",
    newChange <= valRCI ~ "Reliable improvement")) -> tibSamplesMultiplied

So what happens if we adjust the SD across samples but keep Gaussian distributions? As the baseline scores are centred on zero we can change their SD just by multiplying them by any fixed, non-zero value. Here I have multiplied by one of 0.7 or 1.42 depending on the sample.

Show code
tibSamplesMultiplied %>% 
  group_by(sampleID) %>% 
  summarise(SDmultiplied = sd(newBaseline),
            RCImultipliedPS = SDmultiplied * 1.96 * sqrt(2) * sqrt(1 - valReliability)) %>%
  ungroup() -> tmpTib
  
tibSamplesMultiplied %>% 
  left_join(tmpTib, by = "sampleID") -> tibSamplesMultiplied
  
tibSamplesMultiplied %>%
  group_by(sampleID) %>%
  summarise(n = n(),
            multiplier = first(multiplier),
            meanBaseline = mean(newBaseline),
            SDBaseline = first(SDmultiplied),
            meanChange = mean(newChange),
            SDChange = sd(newChange),
            RCI_PS = first(RCImultipliedPS),
            Density = list(Baseline)) -> tmpTib

tmpTib %>%
  select(-c(meanChange, SDChange)) %>%
  flextable() %>%
  mk_par(j = "Density", 
         value = as_paragraph(
           plot_chunk(value = Density, 
                      type = "dens", 
                      col = "red",
                      free_scale = FALSE)
         )) %>%
  colformat_double(digits = 3)

sampleID

n

multiplier

meanBaseline

SDBaseline

RCI_PS

Density

1

2,000

1.420

0.007

1.399

1.735

2

2,000

0.700

0.000

0.709

0.879

3

2,000

1.420

-0.008

1.450

1.798

4

2,000

0.700

0.016

0.700

0.868

5

2,000

1.420

0.008

1.408

1.746

6

2,000

0.700

0.014

0.700

0.868

7

2,000

1.420

-0.006

1.456

1.805

8

2,000

0.700

-0.006

0.709

0.879

9

2,000

1.420

0.032

1.432

1.775

10

2,000

0.700

-0.004

0.702

0.870

11

2,000

1.420

-0.015

1.403

1.740

12

2,000

0.700

0.005

0.694

0.860

13

2,000

1.420

0.034

1.430

1.773

14

2,000

0.700

-0.014

0.710

0.880

15

2,000

1.420

-0.006

1.396

1.730

16

2,000

0.700

0.020

0.706

0.875

17

2,000

1.420

-0.039

1.402

1.738

18

2,000

0.700

-0.014

0.696

0.862

19

2,000

1.420

-0.055

1.419

1.759

20

2,000

0.700

0.001

0.701

0.869

That shows the marked change in the SD of the baseline scores, SDBaseline depending on the multiplier and the impact this has on the per sample RCI values (RCI_PS). I think it is almost impossible to see the difference in the SDs in those tiny density plots.

Pooling within the values of the multiplier shows the differences between the distributions of the baseline scores very clearly.

Show code
tibSamplesMultiplied %>%
  select(multiplier, newBaseline) %>%
  pivot_longer(cols = newBaseline) -> tmpTib

tmpTib %>% 
  group_by(multiplier) %>%
  summarise(mean = mean(value),
            sd = sd(value)) -> tmpTibMeans

tmpTib %>% 
  group_by(multiplier) %>%
  summarise(quantiles = list(quantile(value, probs = c(.05, .25, .75, .95)))) %>%
  unnest_wider(quantiles) %>%
  rename_with(~ str_remove(.x, fixed("%"))) %>%
  rename_with(~ str_c("p", .x)) %>%
  rename(multiplier = pmultiplier) %>%
  pivot_longer(cols = -multiplier,
               names_to = "quantile") -> tmpTibQuantiles

ggplot(data = tmpTib,
       aes(x = value)) +
  facet_wrap(facets = vars(multiplier),
             ncol = 1,
             scale = "fixed") +
  geom_histogram(bins = 50,
                 alpha = .4) +
  geom_vline(data = tmpTibMeans,
             aes(xintercept = mean),
             colour = "blue") +
  geom_vline(data = tmpTibQuantiles,
             aes(xintercept = value),
             colour = "red")+
  ggtitle("Facetted histogram of pooled transformed data facetted by multiplier",
          subtitle = "Blue reference line marks means and red lines mark .05, .25, .75, and .95 quantiles")

Here is the breakdown of the RCI categories using the population RCI of 1.2396128 which we know is now an incorrect RCI as the baseline SDs have been changed.

Show code
tibSamplesMultiplied %>% 
  group_by(sampleID) %>%
  summarise(multiplier = first(multiplier),
            SDBaseline = first(SDmultiplied),
            RCI = first(RCImultipliedPS),
            nNoRelChange = sum(RCIcat3wayMultiplied == "No reliable change"),
            nRelDet = sum(RCIcat3wayMultiplied == "Reliable deterioration"),
            nRelImp = sum(RCIcat3wayMultiplied == "Reliable improvement")) %>%
  rowwise() %>%
  mutate(chisqP = suppressWarnings(chisq.test(as.table(rbind(c(nNoRelChange, nRelDet, nRelImp),
                                                              vecExpectedNs)))$p.value)) %>%
  ungroup() -> tmpTib1

tmpTib1 %>%
  flextable() %>%
  colformat_double(digits = 3) %>%
  bg(j = "chisqP", i = ~ chisqP >= .05, bg = "green") %>%
  bg(j = "chisqP", i = ~ chisqP < .05, bg = "red")

sampleID

multiplier

SDBaseline

RCI

nNoRelChange

nRelDet

nRelImp

chisqP

1

1.420

1.399

1.735

1,704

149

147

0.000

2

0.700

0.709

0.879

1,993

3

4

0.000

3

1.420

1.450

1.798

1,665

176

159

0.000

4

0.700

0.700

0.868

1,994

2

4

0.000

5

1.420

1.408

1.746

1,679

153

168

0.000

6

0.700

0.700

0.868

1,994

4

2

0.000

7

1.420

1.456

1.805

1,643

162

195

0.000

8

0.700

0.709

0.879

1,992

3

5

0.000

9

1.420

1.432

1.775

1,694

155

151

0.000

10

0.700

0.702

0.870

1,991

4

5

0.000

11

1.420

1.403

1.740

1,657

177

166

0.000

12

0.700

0.694

0.860

1,991

5

4

0.000

13

1.420

1.430

1.773

1,704

157

139

0.000

14

0.700

0.710

0.880

1,985

9

6

0.000

15

1.420

1.396

1.730

1,659

156

185

0.000

16

0.700

0.706

0.875

1,991

3

6

0.000

17

1.420

1.402

1.738

1,680

166

154

0.000

18

0.700

0.696

0.862

1,993

6

1

0.000

19

1.420

1.419

1.759

1,671

176

153

0.000

20

0.700

0.701

0.869

1,993

2

5

0.000

So all way off from 95% : 2.5% : 2.5%. Here are the mean baseline SD, RCI and p values from the samples pooling within the multiplier.

Show code
tmpTib1 %>%
  group_by(multiplier) %>%
  summarise(meanSD = mean(SDBaseline),
            meanRCI = mean(RCI),
            meanP = mean(chisqP),
            meanP = format.pval(meanP, digits = 2)) %>%
  flextable() %>%
  colformat_double(digits = 2) %>%
  autofit()

multiplier

meanSD

meanRCI

meanP

0.70

0.70

0.87

8e-16

1.42

1.42

1.76

<2e-16

Here is the breakdown where the multiplier for the baseline scores was .7.

Show code
tibSamplesMultiplied %>%
  filter(multiplier == .7) -> tmpTib

tibSamplesMultiplied %>%
  tabyl(RCIcat3wayMultiplied) %>%
  mutate(percent = 100 * percent,
         percent = sprintf("%4.2f", percent),
         percent = str_c(percent, "%")) %>%
  flextable() %>%
  autofit() %>%
  align(j = "percent", align = "right")

RCIcat3wayMultiplied

n

percent

No reliable change

36,673

91.68%

Reliable deterioration

1,668

4.17%

Reliable improvement

1,659

4.15%

Those proportions of reliable change are markedly higher than they should be.

And this shows the same where the multiplier was 1.42.

Show code
tibSamplesMultiplied %>%
  filter(multiplier == 1.42) -> tmpTib

tmpTib %>%
  tabyl(RCIcat3wayMultiplied) %>%
  mutate(percent = 100 * percent,
         percent = sprintf("%4.2f", percent),
         percent = str_c(percent, "%")) %>%
  flextable() %>%
  autofit() %>%
  align(j = "percent", align = "right")

RCIcat3wayMultiplied

n

percent

No reliable change

16,756

83.78%

Reliable deterioration

1,627

8.13%

Reliable improvement

1,617

8.09%

Again the proportions are well off from 95% : 2.5% : 2.5%.

Per sample RCI categorisation

Here are the per sample RCIs?

Show code
tibSamplesMultiplied %>%
  mutate(RCIcat3wayMultipliedPS = case_when(
    newAbsChange < RCImultipliedPS ~ "No reliable change",
    newChange >= RCImultipliedPS ~ "Reliable deterioration",
    newChange <= RCImultipliedPS ~ "Reliable improvement")) -> tibSamplesMultiplied

tibSamplesMultiplied %>% 
  group_by(sampleID) %>%
  summarise(multiplier = first(multiplier),
            SDBaseline = first(SDmultiplied),
            RCI = first(RCImultipliedPS),
            nNoRelChange = sum(RCIcat3wayMultipliedPS == "No reliable change"),
            nRelDet = sum(RCIcat3wayMultipliedPS == "Reliable deterioration"),
            nRelImp = sum(RCIcat3wayMultipliedPS == "Reliable improvement")) %>%
  rowwise() %>%
  mutate(chisqP = suppressWarnings(chisq.test(as.table(rbind(c(nNoRelChange, nRelDet, nRelImp),
                                                              vecExpectedNs)))$p.value)) %>%
  ungroup() -> tmpTib1

tmpTib1 %>%
  flextable() %>%
  colformat_double(digits = 3) %>%
  bg(j = "chisqP", i = ~ chisqP >= .05, bg = "green") %>%
  bg(j = "chisqP", i = ~ chisqP < .05, bg = "red")

sampleID

multiplier

SDBaseline

RCI

nNoRelChange

nRelDet

nRelImp

chisqP

1

1.420

1.399

1.735

1,902

53

45

0.839

2

0.700

0.709

0.879

1,918

43

39

0.373

3

1.420

1.450

1.798

1,920

49

31

0.102

4

0.700

0.700

0.868

1,913

49

38

0.429

5

1.420

1.408

1.746

1,891

59

50

0.682

6

0.700

0.700

0.868

1,894

60

46

0.581

7

1.420

1.456

1.805

1,913

37

50

0.370

8

0.700

0.709

0.879

1,894

47

59

0.655

9

1.420

1.432

1.775

1,919

43

38

0.323

10

0.700

0.702

0.870

1,907

62

31

0.056

11

1.420

1.403

1.740

1,884

53

63

0.438

12

0.700

0.694

0.860

1,911

38

51

0.432

13

1.420

1.430

1.773

1,906

49

45

0.868

14

0.700

0.710

0.880

1,894

58

48

0.725

15

1.420

1.396

1.730

1,890

57

53

0.751

16

0.700

0.706

0.875

1,904

50

46

0.918

17

1.420

1.402

1.738

1,894

56

50

0.840

18

0.700

0.696

0.862

1,902

56

42

0.596

19

1.420

1.419

1.759

1,894

58

48

0.725

20

0.700

0.701

0.869

1,898

50

52

0.980

Aha, as I thought, all good.

Summary/moral

History

Visit count

web counter

Last updated

Show code
cat(paste(format(Sys.time(), "%d/%m/%Y"), "at", format(Sys.time(), "%H:%M")))
28/05/2025 at 13:30

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY-SA 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Evans (2025, May 27). Chris (Evans) R SAFAQ: Modelling the CTT RCI (Reliable Change Index). Retrieved from https://www.psyctc.org/R_blog/posts/2025-05-27-rcicttmodel/

BibTeX citation

@misc{evans2025modelling,
  author = {Evans, Chris},
  title = {Chris (Evans) R SAFAQ: Modelling the CTT RCI (Reliable Change Index)},
  url = {https://www.psyctc.org/R_blog/posts/2025-05-27-rcicttmodel/},
  year = {2025}
}