Mastodawn

fly51fly (@fly51fly)

에피스템릭 후회 최소화(Epistemic Regret Minimization)를 제안해 LLM에서 발생하는 인과적 'rung collapse' 문제를 다룬 연구입니다. 모델이 '잘못된 이유로 정답을 선택'하는 현상을 진단하고 이를 완화하기 위한 알고리즘적 접근과 이론적 근거를 제시합니다.

https://x.com/fly51fly/status/2022427961300070680

#llm #causal #regretminimization #research

fly51fly (@fly51fly) on X

[LG] Right for the Wrong Reasons: Epistemic Regret Minimization for Causal Rung Collapse in LLMs E Y. Chang [Stanford University] (2026) https://t.co/mhSZP45ryi

X (formerly Twitter)

Marc Lanctot Nov 8, 2022

Not my paper but top of my current reading list. This showed up on arXiv late Sep. It is about #MetaLearning, equilibrium finding, and #regretminimization. Relevant to anyone interested in general #MARL in games.

https://arxiv.org/abs/2209.14110

Meta-Learning in Games

In the literature on game-theoretic equilibrium finding, focus has mainly been on solving a single game in isolation. In practice, however, strategic interactions -- ranging from routing problems to online advertising auctions -- evolve dynamically, thereby leading to many similar games to be solved. To address this gap, we introduce meta-learning for equilibrium finding and learning to play games. We establish the first meta-learning guarantees for a variety of fundamental and well-studied classes of games, including two-player zero-sum games, general-sum games, and Stackelberg games. In particular, we obtain rates of convergence to different game-theoretic equilibria that depend on natural notions of similarity between the sequence of games encountered, while at the same time recovering the known single-game guarantees when the sequence of games is arbitrary. Along the way, we prove a number of new results in the single-game regime through a simple and unified framework, which may be of independent interest. Finally, we evaluate our meta-learning algorithms on endgames faced by the poker agent Libratus against top human professionals. The experiments show that games with varying stack sizes can be solved significantly faster using our meta-learning techniques than by solving them separately, often by an order of magnitude.

arXiv.org