🧠 New paper by Pedamonti et al. (2025, Nature Comm.) shows that the #hippocampus supports multi-task #ReinforcementLearning under partial observability. Mice flexibly inferred hidden task states 🐁, and only models with recurrent memory reproduced behavior, linking #hippocampal dynamics to #POMDP (Partially Observable Multi-Task Reinforcement Learning) inference.

🌍 https://doi.org/10.1038/s41467-025-64591-9

#Neuroscience #CompNeuro

🎉 new preprint day

Wrote some multi-hop reasoning work recently, formalizing #llm inference as a #pomdp

achieved #sota results on game of 24 problem from tree of thougchts

https://arxiv.org/abs/2404.19055

Plan of Thoughts: Heuristic-Guided Problem Solving with Large Language Models

While language models (LMs) offer significant capability in zero-shot reasoning tasks across a wide range of domains, they do not perform satisfactorily in problems which requires multi-step reasoning. Previous approaches to mitigate this involves breaking a larger, multi-step task into sub-tasks and asking the language model to generate proposals ("thoughts") for each sub-task and using exhaustive planning approaches such as DFS to compose a solution. In this work, we leverage this idea to introduce two new contributions: first, we formalize a planning-based approach to perform multi-step problem solving with LMs via Partially Observable Markov Decision Processes (POMDPs), with the LM's own reflections about the value of a state used as a search heuristic; second, leveraging the online POMDP solver POMCP, we demonstrate a superior success rate of 89.4% on the Game of 24 task as compared to existing approaches while also offering better anytime performance characteristics than fixed tree-search which is used previously. Taken together, these contributions allow modern LMs to decompose and solve larger-scale reasoning tasks more effectively.

arXiv.org
A Programming Language With a Pomdp Inside
(2016) : Lin, Christopher H. and Mausam and Weld, Daniel S.
url: http://arxiv.org/abs/1608.08724
#DSL #POMDP #__printed #adaptive_programming #crowd_source #decisions #monte_carlo #programming #semantics
#my_bibtex
A Programming Language With a POMDP Inside

We present POAPS, a novel planning system for defining Partially Observable Markov Decision Processes (POMDPs) that abstracts away from POMDP details for the benefit of non-expert practitioners. POAPS includes an expressive adaptive programming language based on Lisp that has constructs for choice points that can be dynamically optimized. Non-experts can use our language to write adaptive programs that have partially observable components without needing to specify belief/hidden states or reason about probabilities. POAPS is also a compiler that defines and performs the transformation of any program written in our language into a POMDP with control knowledge. We demonstrate the generality and power of POAPS in the rapidly growing domain of human computation by describing its expressiveness and simplicity by writing several POAPS programs for common crowdsourcing tasks.

arXiv.org