Scores from matching things

Mostly about the method of derangements but some huxtable!

Chris Evans https://www.psyctc.org/R_blog/ (PSYCTC.org)https://www.psyctc.org/psyctc/
2022-07-15
Show code
as_tibble(list(x = 1,
               y = 1)) -> tibDat

ggplot(data = tibDat,
       aes(x = x, y = y)) +
  geom_text(label = "Derangements #1",
            size = 20,
            colour = "red",
            angle = 30,
            lineheight = 1) +
  xlab("") +
  ylab("") +
  theme_bw() +
  theme(panel.grid.major = element_blank(),
        panel.grid.minor = element_blank(),
        panel.border = element_blank(),
        panel.background = element_blank(),
        axis.line = element_blank(),
        axis.text = element_blank(),
        axis.ticks = element_blank()) 

[Created 15.vii.22, tweaked 23.vii.22 to add cross link to next Rblog post and to general blog post] The theory behind this is fully described in:

Evans, C., Hughes, J., & Houston, J. (2002). Significance testing the validity of ideographic methods: A little derangement goes a long way. British Journal of Mathematical and Statistical Psychology, 55(2), 385–390. https://doi.org/10.1348/000711002760554525

Do contact me through my work site if you would like a copy of that.

The idea is of matching things which might be purely idiographic. For example in that original paper the matching task presented to therapists from a prison therapy group was to see if they could match the two dimensional principal component plots from person repertory grids created with elicited contructs and varying elements by each of the six members of the group. Both therapists matched four of the six pre-therapy grids successfully; one therapist matched all six post-therapy grids and the other matched three of the six.

The paper showed that the probability of matching four or more objects correctly is always unlikely to happen by chance alone with p < .05 regardless of the number of objects.

All I am doing here is using a bit of R, specifically the function permutations() from the admisc package to get all the possible permutations (i.e. ways of chosing) n objects and using a bit of tidyverse to feed this into a huxtable … i.e. into one of R’s various ways of prettifying and managing tables.

n(objects) = 3

Let’s start with the situation where you only have three objects (as it makes things small and simple). There are six ways of rearranging three objects, three ways to pick the first, two ways to pick the second and then of course the third one is picked for you.

This table shows the six possible permutations of three objects in columns 2 to 4. Then in columns 5 to 7 it shows the matching scores as “Y” or “N” depending on whether each chosen object has been put in the correct place. (Imagine that you had been given three repertory grid plots created from grids from people you knew well and you are trying to match each grid to the person who created it with no other clues.) Finally it shows the matching score.

Show code
options(width = 160)
options(huxtable.knitr_output_format = "html") 
getMatches <- function(vec) {
  ### litle function that returns a vector of zero or one
  ### depending whether the number in the vector matches its
  ### position in the vector
  ### I could have put some input error trapping 
  ### but no need given that I'm only using this here
  return(as.numeric(vec == 1:length(vec)))
}
# getMatches(1:3)
# getMatches(1:59)
# getMatches(c(3, 2, 1))
# getMatches(c(3, 1, 2))


matchScore <- function(vec) {
  ### similar function to getMatches but this time returns
  ### total score of matches
  return(sum(vec == 1:length(vec)))
}
# matchScore(1:3)
# matchScore(1:59)
# matchScore(c(3, 2, 1))
# matchScore(c(3, 1, 2))

### I've wrapped this in suppressMessages to get rid of the irritating renaming messages from dplyr
suppressMessages(admisc::permutations(1:3) %>%
                   ### that got me all the permtations of 1:3 
                   ### but as a matrix
                   as.data.frame() %>% # go to df (avoids warning from dplyr)
                   as_tibble() %>% # and then to tibble!
                   rowwise() %>% # go to rowwise mode
                   ### and compute the matches as a list/vector
                   mutate(matches = list(getMatches(across(everything())))) %>%
                   ungroup() %>% # come out of rowwise (not strictly necessary)
                   ### unnest that to separate columns
                   unnest_wider(matches) %>%
                   ### do some renaming to make things clearer
                   rename_with( ~ gsub("V", "Choice", .x, fixed = TRUE)) %>%
                   rename_with( ~ gsub("...", "Match", .x, fixed = TRUE)) %>%
                   mutate(across(starts_with("Match"), ~ if_else(.x == 1, "Y", "N"))) %>%
                   ### back into rowwise mode
                   rowwise() %>%
                   ### to get the score
                   mutate(score = matchScore(c_across(starts_with("Choice")))) %>%
                   ungroup() %>% 
                   ### create permutation number
                   mutate(permutationN = row_number()) %>%
                   ### rearrange order of columns
                   select(permutationN, everything()) -> tmpTib3)


tmpTib3 %>%
  as_hux() %>%
  set_position("left") %>% # left align the whole table
  set_bold(row = everywhere, col = everywhere) %>% # everything into bold
  set_align(everywhere, everywhere, "center") %>% # everything centred
  set_align(everywhere, 1, "right") %>% # but now right justify the first column
  map_text_color(by_values("Y" = "green")) %>% # colour matches by text recognition
  map_text_color(by_values("N" = "red"))
permutationNChoice1Choice2Choice3Match1Match2Match3score
1123YYY3
2132YNN1
3213NNY1
4231NNN0
5312NNN0
6321NYN1

(Sorry: the colour scheme isn’t great on the yellow I’ve used for this blog/site.) We can see that there is, as there will be for any number of objects, only one way of getting all of them matched correctly. There are three ways to get one matched correctly and that leaves two ways of scoring zero correct matches. There are no ways of scoring two correct matches: if you match the first two correctly then you are left with the last one which you then have to put in the correct place.

So nothing very impressive even about getting all three correct: you had a one in six probability of doing that by chance. Let’s go up to n = 4.

n(objects) = 4

Show code
suppressMessages(admisc::permutations(1:4) %>%
                   as.data.frame() %>%
                   as_tibble() %>%
                   rowwise() %>%
                   mutate(matches = list(getMatches(across(everything())))) %>%
                   unnest_wider(matches) %>%
                   rename_with( ~ gsub("V", "Choice", .x, fixed = TRUE)) %>%
                   rename_with( ~ gsub("...", "Match", .x, fixed = TRUE)) %>%
                   mutate(across(starts_with("Match"), ~ if_else(.x == 1, "Y", "N"))) %>%
                   rowwise() %>%
                   mutate(score = matchScore(c_across(starts_with("Choice")))) %>%
                   ungroup() %>% 
                   mutate(permutationN = row_number()) %>%
                   select(permutationN, everything()) -> tmpTib4)


tmpTib4 %>%
  as_hux() %>%
  set_position("left") %>%
  set_bold(row = everywhere, col = everywhere) %>%
  set_align(everywhere, everywhere, "center") %>%
  set_align(everywhere, 1, "right") %>%
  map_text_color(by_values("Y" = "green")) %>%
  map_text_color(by_values("N" = "red"))
permutationNChoice1Choice2Choice3Choice4Match1Match2Match3Match4score
11234YYYY4
21243YYNN2
31324YNNY2
41342YNNN1
51423YNNN1
61432YNYN2
72134NNYY2
82143NNNN0
92314NNNY1
102341NNNN0
112413NNNN0
122431NNYN1
133124NNNY1
143142NNNN0
153214NYNY2
163241NYNN1
173412NNNN0
183421NNNN0
194123NNNN0
204132NNYN1
214213NYNN1
224231NYYN2
234312NNNN0
244321NNNN0

Now we have 24 ways of permuting the objects and still just the one correct matching of all four. As ever it’s impossible to score n - 1, i.e. three here. There are six ways of scoring two correct matches and eight ways of scoring one correct match leaving nine ways of scoring zero correct matches.

Here’s that score breakdown.

Show code
tmpTib4 %>%
  tabyl(score) %>%
  adorn_pct_formatting(digits = 2) %>%
  arrange(desc(score))
scorenpercent
414.17%
2625.00%
1833.33%
0937.50%

So the chances of getting all four correct by chance alone was p = 1/24 = 0.04, below the conventional p < .05 criterion.

n(objects) = 5

Show code
suppressMessages(admisc::permutations(1:5) %>%
                   as.data.frame() %>%
                   as_tibble() %>%
                   rowwise() %>%
                   mutate(matches = list(getMatches(across(everything())))) %>%
                   unnest_wider(matches) %>%
                   rename_with( ~ gsub("V", "Choice", .x, fixed = TRUE)) %>%
                   rename_with( ~ gsub("...", "Match", .x, fixed = TRUE)) %>%
                   mutate(across(starts_with("Match"), ~ if_else(.x == 1, "Y", "N"))) %>%
                   rowwise() %>%
                   mutate(score = matchScore(c_across(starts_with("Choice")))) %>%
                   ungroup() %>% 
                   mutate(permutationN = row_number()) %>%
                   select(permutationN, everything()) -> tmpTib5)


tmpTib5 %>%
  as_hux() %>%
  set_position("left") %>%
  set_bold(row = everywhere, col = everywhere) %>%
  set_align(everywhere, everywhere, "center") %>%
  set_align(everywhere, 1, "right") %>%
  map_text_color(by_values("Y" = "green")) %>%
  map_text_color(by_values("N" = "red"))
permutationNChoice1Choice2Choice3Choice4Choice5Match1Match2Match3Match4Match5score
112345YYYYY5
212354YYYNN3
312435YYNNY3
412453YYNNN2
512534YYNNN2
612543YYNYN3
713245YNNYY3
813254YNNNN1
913425YNNNY2
1013452YNNNN1
1113524YNNNN1
1213542YNNYN2
1314235YNNNY2
1414253YNNNN1
1514325YNYNY3
1614352YNYNN2
1714523YNNNN1
1814532YNNNN1
1915234YNNNN1
2015243YNNYN2
2115324YNYNN2
2215342YNYYN3
2315423YNNNN1
2415432YNNNN1
2521345NNYYY3
2621354NNYNN1
2721435NNNNY1
2821453NNNNN0
2921534NNNNN0
3021543NNNYN1
3123145NNNYY2
3223154NNNNN0
3323415NNNNY1
3423451NNNNN0
3523514NNNNN0
3623541NNNYN1
3724135NNNNY1
3824153NNNNN0
3924315NNYNY2
4024351NNYNN1
4124513NNNNN0
4224531NNNNN0
4325134NNNNN0
4425143NNNYN1
4525314NNYNN1
4625341NNYYN2
4725413NNNNN0
4825431NNNNN0
4931245NNNYY2
5031254NNNNN0
5131425NNNNY1
5231452NNNNN0
5331524NNNNN0
5431542NNNYN1
5532145NYNYY3
5632154NYNNN1
5732415NYNNY2
5832451NYNNN1
5932514NYNNN1
6032541NYNYN2
6134125NNNNY1
6234152NNNNN0
6334215NNNNY1
6434251NNNNN0
6534512NNNNN0
6634521NNNNN0
6735124NNNNN0
6835142NNNYN1
6935214NNNNN0
7035241NNNYN1
7135412NNNNN0
7235421NNNNN0
7341235NNNNY1
7441253NNNNN0
7541325NNYNY2
7641352NNYNN1
7741523NNNNN0
7841532NNNNN0
7942135NYNNY2
8042153NYNNN1
8142315NYYNY3
8242351NYYNN2
8342513NYNNN1
8442531NYNNN1
8543125NNNNY1
8643152NNNNN0
8743215NNNNY1
8843251NNNNN0
8943512NNNNN0
9043521NNNNN0
9145123NNNNN0
9245132NNNNN0
9345213NNNNN0
9445231NNNNN0
9545312NNYNN1
9645321NNYNN1
9751234NNNNN0
9851243NNNYN1
9951324NNYNN1
10051342NNYYN2
10151423NNNNN0
10251432NNNNN0
10352134NYNNN1
10452143NYNYN2
10552314NYYNN2
10652341NYYYN3
10752413NYNNN1
10852431NYNNN1
10953124NNNNN0
11053142NNNYN1
11153214NNNNN0
11253241NNNYN1
11353412NNNNN0
11453421NNNNN0
11554123NNNNN0
11654132NNNNN0
11754213NNNNN0
11854231NNNNN0
11954312NNYNN1
12054321NNYNN1

So now we have 120 ways of permuting the objects and still just the one correct matching of all of them. Here’s the score breakdown.

Show code
tmpTib5 %>%
  tabyl(score) %>%
  adorn_pct_formatting(digits = 2) %>%
  arrange(desc(score))
scorenpercent
510.83%
3108.33%
22016.67%
14537.50%
04436.67%

It was impossible to score four matches but getting all five correct was unlikely by chance alone at p = 1/120 = 0.008

Summary

It can be seen that the number of possible ways to permute n objects goes up rapidly as n increases. That increasing number of ways of permuting things means that getting four or more correctly matched is always unlikely at p < .05 regardless of n. There’s a lookup table at https://link.psyctc.org/derangements where you can look up the scores and their probabilities for n <= 30.

Historical footnote

This was in my ancient derangements.R file that I clearly created while I still had access to S+:

This program differs from a program for S+ only in having to declare a function, factorial() which comes with S+ but not the version of R on which I’m testing this (1.7.1) and in explicitly declaring tmp at the end of all.derangements() since R won’t return it to the console (does return it for assignment) if you just end the function with the assignment to tmp

# factorial <- function(n) {
#   gamma(n+1)
# }

I’ve often wondered which was my first R release, so it was 1.7.1 or earlier. R has long since acquired a factorial() function in the base functions.

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY-SA 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Evans (2022, July 15). Chris (Evans) R SAFAQ: Scores from matching things. Retrieved from https://www.psyctc.org/R_blog/posts/2022-07-15-matching-scores/

BibTeX citation

@misc{evans2022scores,
  author = {Evans, Chris},
  title = {Chris (Evans) R SAFAQ: Scores from matching things},
  url = {https://www.psyctc.org/R_blog/posts/2022-07-15-matching-scores/},
  year = {2022}
}