Mastodawn

fly51fly (@fly51fly)

대규모 언어모델(LLM)의 Best-of-N 샘플링 상황에서 발생하는 적대적 리스크(adversarial risk)를 통계적으로 추정하는 방법을 제안한 연구가 arXiv에 공개되었습니다. Microsoft Research 연구진이 제출한 논문으로, 샘플링 기반 생성 과정에서의 취약성 평가와 리스크 추정 기법을 소개하고 실험으로 유효성을 보였습니다.

https://x.com/fly51fly/status/2018441692735746503

#adversarialrobustness #llm #bestofn #sampling

fly51fly (@fly51fly) on X

[LG] Statistical Estimation of Adversarial Risk in Large Language Models under Best-of-N Sampling M Feng, X Liu, W Yang, C Xu... [Microsoft Research] (2026) https://t.co/5mtrb5hW99

X (formerly Twitter)

Hacker News Apr 28, 2025

Inference-Aware Fine-Tuning for Best-of-N Sampling in Large Language Models

https://arxiv.org/abs/2412.15287

#HackerNews #InferenceAwareFineTuning #BestOfN #Sampling #LargeLanguageModels #AIResearch

Inference-Aware Fine-Tuning for Best-of-N Sampling in Large Language Models

Recent studies have indicated that effectively utilizing inference-time compute is crucial for attaining better performance from large language models (LLMs). In this work, we propose a novel inference-aware fine-tuning paradigm, in which the model is fine-tuned in a manner that directly optimizes the performance of the inference-time strategy. We study this paradigm using the simple yet effective Best-of-N (BoN) inference strategy, in which a verifier selects the best out of a set of LLM-generated responses. We devise the first imitation learning and reinforcement learning~(RL) methods for BoN-aware fine-tuning, overcoming the challenging, non-differentiable argmax operator within BoN. We empirically demonstrate that our BoN-aware models implicitly learn a meta-strategy that interleaves best responses with more diverse responses that might be better suited to a test-time input -- a process reminiscent of the exploration-exploitation trade-off in RL. Our experiments demonstrate the effectiveness of BoN-aware fine-tuning in terms of improved performance and inference-time compute. In particular, we show that our methods improve the Bo32 performance of Gemma 2B on Hendrycks MATH from 26.8% to 30.8%, and pass@32 from 60.0% to 67.0%, as well as the pass@16 on HumanEval from 61.6% to 67.1%.

arXiv.org