Derangements #2

Follows on from ‘Scores from matching things’

Chris Evans https://www.psyctc.org/R_blog/ (PSYCTC.org)https://www.psyctc.org/psyctc/
2022-07-23
Show code
as_tibble(list(x = 1,
               y = 1)) -> tibDat

ggplot(data = tibDat,
       aes(x = x, y = y)) +
  geom_text(label = "Derangements #2",
            size = 20,
            colour = "red",
            angle = 30,
            lineheight = 1) +
  xlab("") +
  ylab("") +
  theme_bw() +
  theme(panel.grid.major = element_blank(),
        panel.grid.minor = element_blank(),
        panel.border = element_blank(),
        panel.background = element_blank(),
        axis.line = element_blank(),
        axis.text = element_blank(),
        axis.ticks = element_blank()) 

Updated to add contact me 11.ix.22

In my last post here, Scores from matching things I gave the background to the probabilities of achieving certain scores on matching tasks by chance alone to help explain the perhaps counter-intuitive finding that matching four or more things correctly is unlikely by chance alone at p < .05 regardless of the number of objects to be matched.

This just adds a bit more to that, mostly as plots and complements both that Rblog post and an “ordinary” blog post, Sometimes n=4 is enough.

What I wanted was to show how rapidly the probabilities of achieving any particular score stabilise to an asymptotic value as n increases. Here we are for n from 4 to 15 and scores from 4 to 10.

Show code
### create some functions (as in previous post)
all.derangements <- function(n){
  cumprob <- prob <- number <- term <- score <- rev(0:n)
  for (m in 1:n) {
    i <- m+1
    s <- n-m
    term[i] <- ((-1)^(m))/(factorial(m))
  }  
  term[1] <- 1
  for (i in 0:n) {
    s <- i+1
    prob[s] <- (sum(term[1:s]))/factorial(n-i)
  }
  number <- factorial(n)*prob
  for (s in 0:n) {
    m <- n-s
    i <- m+1
    cumprob[i] <- sum(prob[1:i])
  }
  tmp <- cbind(n, score,number,prob,cumprob)
  tmp
}

p.derange.score <- function(score,n){
  if (score > n) stop("Score cannot be greater than n")
  if (score == (n-1)) stop ("Score cannot be n-1")
  cumprob <- prob <- term <- rev(0:n)
  for (m in 1:n) {
    i <- m+1
    s <- n-m
    term[i] <- ((-1)^(m))/(factorial(m))
  }  
  term[1] <- 1
  for (i in 0:n) {
    s <- i+1
    prob[s] <- (sum(term[1:s]))/factorial(n-i)
  }
  for (s in 0:n) {
    m <- n-s
    i <- m+1    
    cumprob[i] <- sum(prob[1:i])
  }
  cumprob[n+1-score]
}

### now let's go a bit further
### get all the possible scores for n from 4 to 30
lapply(4:30, FUN = all.derangements) -> tmpList
### I always forget this nice little bit of base R and I'm a bit surprised that there doesn't seem to be a nice tidyverse alternative
do.call(rbind.data.frame, tmpList) %>%
  as_tibble() -> tmpTib
### this was just to produce some tables for my blog post at https://www.psyctc.org/psyctc/2022/07/23/sometimes-n4-is-enough/
# tmpTib %>% 
#   write_csv(file = "derangements.csv")

### ditto
# 1:14 %>% 
#   as_tibble() %>%
#   rename(n = value) %>%
#   mutate(PossibleWays = factorial(n),
#          PossibleWays = prettyNum(PossibleWays, big.mark = ",")) %>%
#   write_csv(file = "numbers.csv")

### but Vectorizing the function seemed cleaner so ...
Vectorize(FUN = all.derangements) -> All.derangements
All.derangements(4:14) -> tmpList
### back to the do.call() just for the tables 
# do.call(rbind, tmpList) %>%
#   as_tibble() %>%
#   filter(score == 4) %>%
#   select(n, number) %>%
#   mutate(number = prettyNum(number, big.mark = ",")) %>%
#   write_csv("correct.csv")

# do.call(rbind, tmpList) %>%
#   as_tibble() %>%
#   filter(score == 4) %>%
#   mutate(totalPerms = factorial(n)) %>%
#   select(-prob) %>%
#   select(n, totalPerms, everything()) %>%
#   write_csv("final.csv")
  

### OK, now for this Rblog post!
All.derangements(4:15) -> tmpList
do.call(rbind, tmpList) %>%
  as_tibble() %>%
  filter(score > 3 & score < 11) %>%
  mutate(score = ordered(score,
                         levels = 4:10)) -> tmpTib

ggplot(data = tmpTib,
       aes(x = n, y = cumprob, colour = score, group = score)) +
  geom_point() +
  geom_line() +
  scale_x_continuous(breaks = 3:15,
                     minor_breaks = 3:15,
                     limits = c(3, 15)) +
  ylab("p") +
  theme_bw() 

Here’s the same on a log10 y axis to separate the p values for the higher scores.

Show code
ggplot(data = tmpTib,
       aes(x = n, y = cumprob, colour = score, group = score)) +
  geom_point() +
  geom_line() +
  scale_x_continuous(breaks = 3:15,
                     minor_breaks = 3:15,
                     limits = c(3, 15)) +
  scale_y_continuous(trans = "log10") +
  ylab("p") +
  theme_bw() 

This next table shows how rapidly p values of real interest stabilise. The table is ordered by number of objects (n) within score. The column p is the probability of getting that score or better by chance alone, diffProb is the absolute change in that p value from the one for the previous n, diffPerc is the difference as a percentage of the previous p value. diffProbLT001 flags when the change in absolute p vaue is below .001 at which point I think in my realm any further precision is spurious. However, diffLT1pct flags when the change in p value is below 1% of the previous p value just in case someone wants that sort precise convergence.

Show code
All.derangements(4:20) -> tmpList
do.call(rbind, tmpList) %>%
  as_tibble() %>%
  filter(score > 3 & score < 11)  -> tmpTib


### just working out how stable the p values get how soon
tmpTib %>%
  arrange(score) %>%
  group_by(score) %>%
  mutate(diffProb = abs(cumprob - lag(cumprob)),
         diffProbLT001 = if_else(diffProb < .001, "Y", "N"),
         diffPerc = 100 * diffProb /lag(cumprob),
         diffLT1pct = if_else(diffPerc < 1, "Y", "N")) %>% 
  ungroup() %>%
  select(-c(number, prob)) %>%
  rename(p = cumprob) %>%
  select(score, everything()) %>%
  as_hux() %>%
  set_position("left") %>% # left align the whole table
  set_bold(row = everywhere, col = everywhere) %>% # everything into bold
  set_align(everywhere, everywhere, "center") %>% # everything centred
  set_align(everywhere, 1:2, "right") %>% # but now right justify the first two columns
  map_text_color(by_values("Y" = "green")) %>% # colour matches by text recognition
  map_text_color(by_values("N" = "red"))
scorenpdiffProbdiffProbLT001diffPercdiffLT1pct
440.0417
450.008330.0333N80N
460.02220.0139N167N
470.01830.00397N17.9N
480.01910.000868Y4.76N
490.0190.000154Y0.807Y
4100.0192.31e-05Y0.122Y
4110.0193.01e-06Y0.0158Y
4120.0193.44e-07Y0.00181Y
4130.0193.53e-08Y0.000186Y
4140.0193.28e-09Y1.73e-05Y
4150.0192.78e-10Y1.47e-06Y
4160.0192.17e-11Y1.15e-07Y
4170.0191.57e-12Y8.29e-09Y
4180.0191.06e-13Y5.59e-10Y
4190.0196.71e-15Y3.53e-11Y
4200.0193.99e-16Y2.1e-12Y
550.00833
560.001390.00694N83.3N
570.004370.00298N214N
580.00350.000868Y19.9N
590.003690.000193Y5.52N
5100.003663.47e-05Y0.941Y
5110.003665.26e-06Y0.144Y
5120.003666.89e-07Y0.0188Y
5130.003667.95e-08Y0.00217Y
5140.003668.2e-09Y0.000224Y
5150.003667.65e-10Y2.09e-05Y
5160.003666.52e-11Y1.78e-06Y
5170.003665.12e-12Y1.4e-07Y
5180.003663.72e-13Y1.02e-08Y
5190.003662.52e-14Y6.87e-10Y
5200.003661.59e-15Y4.35e-11Y
660.00139
670.0001980.00119N85.7N
680.0007190.000521Y262N
690.0005650.000154Y21.5N
6100.00063.47e-05Y6.15N
6110.0005936.31e-06Y1.05N
6120.0005949.65e-07Y0.163Y
6130.0005941.27e-07Y0.0214Y
6140.0005941.48e-08Y0.00248Y
6150.0005941.53e-09Y0.000258Y
6160.0005941.44e-10Y2.42e-05Y
6170.0005941.23e-11Y2.07e-06Y
6180.0005949.67e-13Y1.63e-07Y
6190.0005947.04e-14Y1.19e-08Y
6200.0005944.78e-15Y8.04e-10Y
770.000198
782.48e-050.000174Y87.5N
790.0001027.72e-05Y311N
7107.88e-052.31e-05Y22.7N
7118.41e-055.26e-06Y6.68N
7128.31e-059.65e-07Y1.15N
7138.33e-051.48e-07Y0.179Y
7148.32e-051.97e-08Y0.0236Y
7158.32e-052.3e-09Y0.00276Y
7168.32e-052.39e-10Y0.000287Y
7178.32e-052.25e-11Y2.7e-05Y
7188.32e-051.93e-12Y2.32e-06Y
7198.32e-051.53e-13Y1.83e-07Y
7208.32e-051.12e-14Y1.34e-08Y
882.48e-05
892.76e-062.2e-05Y88.9N
8101.27e-059.92e-06Y360N
8119.67e-063.01e-06Y23.7N
8121.04e-056.89e-07Y7.12N
8131.02e-051.27e-07Y1.23N
8141.03e-051.97e-08Y0.192Y
8151.02e-052.62e-09Y0.0256Y
8161.02e-053.08e-10Y0.003Y
8171.02e-053.22e-11Y0.000314Y
8181.02e-053.04e-12Y2.96e-05Y
8191.02e-052.62e-13Y2.55e-06Y
8201.02e-052.07e-14Y2.02e-07Y
992.76e-06
9102.76e-072.48e-06Y90N
9111.4e-061.13e-06Y409N
9121.06e-063.44e-07Y24.6N
9131.14e-067.95e-08Y7.51N
9141.12e-061.48e-08Y1.3N
9151.13e-062.3e-09Y0.204Y
9161.13e-063.08e-10Y0.0273Y
9171.13e-063.62e-11Y0.00322Y
9181.13e-063.8e-12Y0.000337Y
9191.13e-063.6e-13Y3.2e-05Y
9201.13e-063.11e-14Y2.76e-06Y
10102.76e-07
10112.51e-082.51e-07Y90.9N
10121.4e-071.15e-07Y458N
10131.05e-073.53e-08Y25.3N
10141.13e-078.2e-09Y7.85N
10151.11e-071.53e-09Y1.36N
10161.11e-072.39e-10Y0.215Y
10171.11e-073.22e-11Y0.0289Y
10181.11e-073.8e-12Y0.00341Y
10191.11e-074e-13Y0.000359Y
10201.11e-073.8e-14Y3.41e-05Y

Do contact me if this interests you and if you might want to use the method with real data.

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY-SA 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Evans (2022, July 23). Chris (Evans) R SAFAQ: Derangements #2. Retrieved from https://www.psyctc.org/R_blog/posts/2022-07-23-derangements-2/

BibTeX citation

@misc{evans2022derangements,
  author = {Evans, Chris},
  title = {Chris (Evans) R SAFAQ: Derangements #2},
  url = {https://www.psyctc.org/R_blog/posts/2022-07-23-derangements-2/},
  year = {2022}
}