Mastodawn

#AAMAS gave me a statue of Aphrodite to thank me for my keynote! I've always been more of an Athena person (because Chicago, not MIT.) Hope Athena doesn't get jealous -- or angry that I'd think the goddess of wisdom gets jealous over something like that. But you know how those myths quite often go...

Joanna Bryson 6d ago

Unfortunately I can't figure out how to retrieve @[email protected] papers to from particularl sessions, but both the GAAI panels look to have super interesting papers. If there's no way to dynamically build that page, ( #AAMAS ?) then someone should make a Webpage! @[email protected]

RE: https://bsky.app/profile/did:plc:4coer6ziqe3pbur5xh6gfkdk/post/3mmy53nzt6c2x

Show thread

Joanna Bryson 6d ago

Also super! arxiv.org/abs/2511.07568 ProcLLM guiding agnetic LLMs with procedural knowledge Vincent Hsiao, Mak Roberts, Leslie N. Smith ProcLLM greatly increases success rate of agentic llms on multi step tasks, very small LLM with this can exceed performance of a big, famous llm without. #aamas

Procedural Knowledge Improves ...

Procedural Knowledge Improves Agentic LLM Workflows

Large language models (LLMs) often struggle when performing agentic tasks without substantial tool support, prom-pt engineering, or fine tuning. Despite research showing that domain-dependent, procedural knowledge can dramatically increase planning efficiency, little work evaluates its potential for improving LLM performance on agentic tasks that may require implicit planning. We formalize, implement, and evaluate an agentic LLM workflow that leverages procedural knowledge in the form of a hierarchical task network (HTN). Empirical results of our implementation show that hand-coded HTNs can dramatically improve LLM performance on agentic tasks, and using HTNs can boost a 20b or 70b parameter LLM to outperform a much larger 120b parameter LLM baseline. Furthermore, LLM-created HTNs improve overall performance, though less so. The results suggest that leveraging expertise--from humans, documents, or LLMs--to curate procedural knowledge will become another important tool for improving LLM workflows.

arXiv.org

Joanna Bryson 6d ago

Awesome talk! roger0426.github.io/MENSA/ online in real time. Two things I love: knowing when to "think" and getting massive improvement, AND #netHack – I haven't seen that as a test domain before, #LLMWin #AAMAS #AAMAS2026

MENSA: Leveraging Mental Simul...

MENSA: Leveraging Mental Simulation for In-Context Policy Improvement in LLM Agents

MENSA is a novel model-based approach that enhances LLM agents via mental simulation, outperforming state-of-the-art in ScienceWorld and NetHack.

MENSA Project

Show thread

Joanna Bryson, blathering May 28

Wooldridge keynote #aamas

Show thread

Joanna Bryson, blathering May 28

Robust Counterfactual Inference in Markov Decision Processes
Jessica Lally, Milad Kazemi, Nicola Paoletti prizewinning or at least nominated paper https://arxiv.org/abs/2502.13731 #aamas

Robust Counterfactual Inference in Markov Decision Processes

This paper addresses a key limitation in existing counterfactual inference methods for Markov Decision Processes (MDPs). Current approaches assume a specific causal model to make counterfactuals identifiable. However, there are usually many causal models that align with the observational and interventional distributions of an MDP, each yielding different counterfactual distributions, so fixing a particular causal model limits the validity (and usefulness) of counterfactual inference. We propose a novel non-parametric approach that computes tight bounds on counterfactual transition probabilities across all compatible causal models. Unlike previous methods that require solving prohibitively large optimisation problems (with variables that grow exponentially in the size of the MDP), our approach provides closed-form expressions for these bounds, making computation highly efficient and scalable for non-trivial MDPs. Once such an interval counterfactual MDP is constructed, our method identifies robust counterfactual policies that optimise the worst-case reward w.r.t. the uncertain interval MDP probabilities. We evaluate our method on various case studies, demonstrating improved robustness over existing methods.

arXiv.org

Show thread

Joanna Bryson, blathering May 28

TBH I went to an archealogical site during the coffee break after a quick look at posters, so missed the first half of the morning session. But enjoying an afternoon session now https://mastodon.social/@j2bryson.bsky[email protected]/116651971743987150n #AAMAS the session chairs are being more agile and fighting to keep audience, so you may not see talks in expected order but IMO building a community understanding is a higher proirity so I think this is good.

Joanna Bryson May 28

Integrating Counterfactual Simulations with Language Models for Explaining Multi-Agent Behaviour Bálint Gyevnár, Christopher G. Lucas, Stefano V. Albrecht, Shay B. Cohen arxiv.org/abs/2505.17801 #AAMAS

Integrating Counterfactual Sim...

Integrating Counterfactual Simulations with Language Models for Explaining Multi-Agent Behaviour

Autonomous multi-agent systems (MAS) are useful for automating complex tasks but raise trust concerns due to risks such as miscoordination or goal misalignment. Explainability is vital for users' trust calibration, but explainable MAS face challenges due to complex environments, the human factor, and non-standardised evaluation. Leveraging the counterfactual effect size model and LLMs, we propose Agentic eXplanations via Interrogative Simulation (AXIS). AXIS generates human-centred action explanations for multi-agent policies by having an LLM interrogate an environment simulator using prompts like 'whatif' and 'remove' to observe and synthesise counterfactual information over multiple rounds. We evaluate AXIS on autonomous driving across ten scenarios for five LLMs with a comprehensive methodology combining robustness, subjective preference, correctness, and goal/action prediction with an external LLM as evaluator. Compared to baselines, AXIS improves perceived explanation correctness by at least 7.7% across all models and goal prediction accuracy by 23% for four models, with comparable action prediction accuracy, achieving the highest scores overall. Our code is open-sourced at https://github.com/gyevnarb/axis.

arXiv.org

Show thread

Joanna Bryson, blathering May 28

Actually no demo but a great excuse to enjoy the weather. Oh, one‘s flying now.

#humanCentring #aamas

Show thread

Joanna Bryson, blathering May 28

Sven Koenig
"I only work on cooperative robots because I build them. They have the team in mind. But sometimes we do decentralised"

genau :-) #humanCentring #aamas