Three years in the making - our big review/opinion piece on the capabilities of large language models (LLMs) from the cognitive science perspective.
Thread below! 1/
#AI #cogneuro #NLP #LLMs #languageandthought
https://arxiv.org/abs/2301.06627

Dissociating language and thought in large language models
Large Language Models (LLMs) have come closest among all models to date to mastering human language, yet opinions about their linguistic and cognitive capabilities remain split. Here, we evaluate LLMs using a distinction between formal linguistic competence -- knowledge of linguistic rules and patterns -- and functional linguistic competence -- understanding and using language in the world. We ground this distinction in human neuroscience, which has shown that formal and functional competence rely on different neural mechanisms. Although LLMs are surprisingly good at formal competence, their performance on functional competence tasks remains spotty and often requires specialized fine-tuning and/or coupling with external modules. We posit that models that use language in human-like ways would need to master both of these competence types, which, in turn, could require the emergence of mechanisms specialized for formal linguistic competence, distinct from functional competence.
arXiv.orgCheck out the paper for an interpretation of these results, including a discussion of selectional restrictions, reporter bias, and more!
#LLMs #languagemodels #NLP #eventknowledge #commonsense #interpretability #languageandthought
(here's the paper link again) https://arxiv.org/abs/2212.01488
7/

Event knowledge in large language models: the gap between the impossible and the unlikely
Word co-occurrence patterns in language corpora contain a surprising amount
of conceptual knowledge. Large language models (LLMs), trained to predict words
in context, leverage these patterns to achieve impressive performance on
diverse semantic tasks requiring world knowledge. An important but understudied
question about LLMs' semantic abilities is whether they acquire generalized
knowledge of common events. Here, we test whether five pre-trained LLMs (from
2018's BERT to 2023's MPT) assign higher likelihood to plausible descriptions
of agent-patient interactions than to minimally different implausible versions
of the same event. Using three curated sets of minimal sentence pairs (total
n=1,215), we found that pre-trained LLMs possess substantial event knowledge,
outperforming other distributional language models. In particular, they almost
always assign higher likelihood to possible vs. impossible events (The teacher
bought the laptop vs. The laptop bought the teacher). However, LLMs show less
consistent preferences for likely vs. unlikely events (The nanny tutored the
boy vs. The boy tutored the nanny). In follow-up analyses, we show that (i) LLM
scores are driven by both plausibility and surface-level sentence features,
(ii) LLM scores generalize well across syntactic variants (active vs. passive
constructions) but less well across semantic variants (synonymous sentences),
(iii) some LLM errors mirror human judgment ambiguity, and (iv) sentence
plausibility serves as an organizing dimension in internal LLM representations.
Overall, our results show that important aspects of event knowledge naturally
emerge from distributional linguistic patterns, but also highlight a gap
between representations of possible/impossible and likely/unlikely events.
arXiv.org