In-context learning in #transformers is one of those mysterious #ML phenomena that needs more attention (no pun intended) from #neuroscientists.

In-context learning is a phenomenon in large language models where the model "learns" a task just by observing some input-output examples, without updating any parameters.
"Simply by adjusting a “prompt”, transformers can be adapted to do many useful things without re-training, such as translation, question-answering, arithmetic, and many other tasks. Using “prompt engineering” to leverage in-context learning became a popular topic of study and discussion." (https://transformer-circuits.pub/2022/in-context-learning-and-induction-heads/index.html)

Interestingly, two recent works (H/T @roydanroy) showed that in-context learning (at least under certain conditions) match solutions found by gradient descent:
1) Transformers learn in-context by gradient descent: https://arxiv.org/abs/2212.07677
2) What learning algorithm is in-context learning? Investigations with linear models: https://arxiv.org/abs/2211.15661

In #neuroscience, synaptic plasticity is generally thought to be the mechanism underlying many of the behavioral improvements that are loosely referred to as learning.

Could in-context #learning be an alternative mechanism underlying at least some behavioral improvements? Given the suggested similarities of the #hippocampus representation learning and transformers (https://arxiv.org/abs/2112.04035), it'd be interesting to see the implications of in-context learning for our understanding of #memory formation in the hippocampus? #NeuroAI

@ShahabBakht @roydanroy

Jane Wang did some work related to this, actually:

https://www.nature.com/articles/s41593-018-0147-8

Prefrontal cortex as a meta-reinforcement learning system - Nature Neuroscience

Humans and other mammals are prodigious learners, partly because they also ‘learn how to learn’. Wang and colleagues present a new theory showing how learning to learn may arise from interactions between prefrontal cortex and the dopamine system.

Nature

@tyrell_turing @roydanroy

Very cool. The “emerged prefrontal-based learning algorithm” that they talk about is probably the closest to the in-context learning of LLMs.

Also, this shows that RNNs (LSTMs in this case) can also show the same behavior and it’s not specific to transformers. @introspection If I remember correctly, you were also curious about this.