🧠 New paper by Pedamonti et al. (2025, Nature Comm.) shows that the #hippocampus supports multi-task #ReinforcementLearning under partial observability. Mice flexibly inferred hidden task states 🐁, and only models with recurrent memory reproduced behavior, linking #hippocampal dynamics to #POMDP (Partially Observable Multi-Task Reinforcement Learning) inference.
🌍 https://doi.org/10.1038/s41467-025-64591-9
#Neuroscience #CompNeuro
🎉 new preprint day
Wrote some multi-hop reasoning work recently, formalizing #llm inference as a #pomdp
achieved #sota results on game of 24 problem from tree of thougchts
https://arxiv.org/abs/2404.19055

Plan of Thoughts: Heuristic-Guided Problem Solving with Large Language Models
While language models (LMs) offer significant capability in zero-shot reasoning tasks across a wide range of domains, they do not perform satisfactorily in problems which requires multi-step reasoning. Previous approaches to mitigate this involves breaking a larger, multi-step task into sub-tasks and asking the language model to generate proposals ("thoughts") for each sub-task and using exhaustive planning approaches such as DFS to compose a solution. In this work, we leverage this idea to introduce two new contributions: first, we formalize a planning-based approach to perform multi-step problem solving with LMs via Partially Observable Markov Decision Processes (POMDPs), with the LM's own reflections about the value of a state used as a search heuristic; second, leveraging the online POMDP solver POMCP, we demonstrate a superior success rate of 89.4% on the Game of 24 task as compared to existing approaches while also offering better anytime performance characteristics than fixed tree-search which is used previously. Taken together, these contributions allow modern LMs to decompose and solve larger-scale reasoning tasks more effectively.
arXiv.orgA Programming Language With a POMDP Inside
We present POAPS, a novel planning system for defining Partially Observable
Markov Decision Processes (POMDPs) that abstracts away from POMDP details for
the benefit of non-expert practitioners. POAPS includes an expressive adaptive
programming language based on Lisp that has constructs for choice points that
can be dynamically optimized. Non-experts can use our language to write
adaptive programs that have partially observable components without needing to
specify belief/hidden states or reason about probabilities. POAPS is also a
compiler that defines and performs the transformation of any program written in
our language into a POMDP with control knowledge. We demonstrate the generality
and power of POAPS in the rapidly growing domain of human computation by
describing its expressiveness and simplicity by writing several POAPS programs
for common crowdsourcing tasks.
arXiv.org