Here's something to kick things off over here: in a new paper, we found that GPT-3 matches or exceeds human performance on zero-shot analogical reasoning, including on a text-based version of Raven's Progressive Matrices.

https://arxiv.org/abs/2212.09196v1

Emergent Analogical Reasoning in Large Language Models

The recent advent of large language models has reinvigorated debate over whether human cognitive capacities might emerge in such generic models given sufficient training data. Of particular interest is the ability of these models to reason about novel problems zero-shot, without any direct training. In human cognition, this capacity is closely tied to an ability to reason by analogy. Here, we performed a direct comparison between human reasoners and a large language model (the text-davinci-003 variant of GPT-3) on a range of analogical tasks, including a novel text-based matrix reasoning task closely modeled on Raven's Progressive Matrices. We found that GPT-3 displayed a surprisingly strong capacity for abstract pattern induction, matching or even surpassing human capabilities in most settings. Our results indicate that large language models such as GPT-3 have acquired an emergent ability to find zero-shot solutions to a broad range of analogy problems.

arXiv.org
@taylorwwebb @achterbrain Thoughts this might pique your interest.
@adel @taylorwwebb This is exactly what I was looking for! Thanks so much for tagging me here!

@adel @taylorwwebb

@taylorwwebb Did you every try how much the performance in the text based presentation matches the image based representation in humans?
In humans the data shows that the representation can matter quite a lot (e.g. see here https://www.pnas.org/doi/10.1073/pnas.1621147114) and I wonder whether the number representation already (partially) solves the compositionality problem for the model?

I like your discussion about the potential for a holistic solve in LLMs - will need to think more about this!

@achterbrain @adel yes we found that human error rates on our digit matrix problems were extremely similar to error rates on the standard visual RPM problems (figure 4 of the paper). I like this study from Duncan et al., but I think it’s not entirely conclusive about the key source of difficulty in RPM. In particular, the separated problems remove two sources of difficulty - object segmentation and correspondence finding. I believe the latter is more important.
@achterbrain @adel by ‘correspondence finding’, I mean the process of determining which elements go together to form a sub-problem. Our task doesn’t require object segmentation, but it arguably does still require correspondence finding.
@taylorwwebb @adel Right, I see! That is a very interesting additional perspective, thank you for elaborating. I will go through your writing in more detail next week, as I find this very intriguing.
If there is anything related that came out after you put your work online and has hence not be referenced, I would be very interesting to hear about it - though I reckon not likely since it is very recent!
@achterbrain @adel nothing more for now, will probably post an update to the paper soon as we’ve also run a bunch of additional tests, will post here when that’s ready.