📜 Paper: https://arxiv.org/abs/2506.15498
🤖 Models: https://huggingface.co/collections/UKPLab/spare-prm
💻 Code: https://github.com/UKPLab/aaai2026-spare-prm

Follow the authors Imbesat Hassan Rizvi and Iryna Gurevych from the Ubiquitous Knowledge Processing Lab (UKP Lab), Technische Universität Darmstadt and Xiaodan Zhu from the Department of Electrical and Computer Engineering, Smith Engineering and Ingenuity Labs Research Institute at Queen's University.

#AAAI2026 #ProcessSupervision #Reasoning #RewardModelling #ReferenceGuidedEvaluation

SPARE: Single-Pass Annotation with Reference-Guided Evaluation for Automatic Process Supervision and Reward Modelling

Process or step-wise supervision has played a crucial role in advancing complex multi-step reasoning capabilities of Large Language Models (LLMs). However, efficient, high-quality automated process annotation remains a significant challenge. To address this, we introduce Single-Pass Annotation with Reference-Guided Evaluation (SPARE), a novel structured framework that enables efficient per-step annotation by jointly aligning solution steps to reference solutions and determine its accuracy with explicit reasoning in single generation. We demonstrate SPARE's effectiveness across four diverse datasets spanning mathematical reasoning (GSM8K, MATH), multi-hop question answering (MuSiQue-Ans), and spatial reasoning (SpaRP), showing consistent improvements in two applications: (1) training Process Reward Models (PRMs) for ranking and aggregating multiple generations, and (2) fine-tuning models via offline reinforcement learning for greedy decoding. On ProcessBench, SPARE demonstrates data-efficient out-of-distribution generalization, using only $\sim$16% of training samples compared to human-labeled and other synthetically trained baselines. Additionally, it achieves competitive performance with MCTS-based methods while offering 2.3$\times$ speedup in terms of total token count. Manual analysis reveals complementary precision-recall characteristics with MCTS approaches, suggesting potential for ensemble methods. These results establish SPARE as a practical and scalable solution for automatic process supervision in LLM reasoning.

arXiv.org
Some people I know and love could be playing #Minecraft together if I ran a server for them. I'm happy to do that, provided I can approximately never again pay attention to it.

Let's see how this setup pans out: https://schmonz.com/2025/04/15/sensible-basic-minecraft-hosting/

#SelfHosting #s6 #ProcessSupervision
Sensible basic Minecraft hosting

📚 Turns out, your teacher was right all along! Showing your working out is crucial, even for AI. Process supervision rewards each step of reasoning, reinforcing the importance of a well-defined chain-of-thought. Learning from the classroom to advance AI. #ProcessSupervision #CriticalThinking #AIEducation 🧮🎓🔬
🔍 Process supervision takes us closer to understanding the black box of AI. By rewarding each step, we unravel the model's decision-making process. It's a step forward in transparency and interpretability. Exciting times ahead! #ProcessSupervision #Transparency #Interpretability #AIInsights 🧠🔓🔍
🌟 Process supervision: a mind-blowing yet obvious concept! Rewarding each step in mathematical reasoning improves performance and alignment. Let's unlock the full potential of AI! #ProcessSupervision #MathReasoning #AIAdvancements 🧠🚀🔢