Linguistic Calibration trains Llama 2 to emit confidence phrases that let a downstream reader make calibrated forecasts on related questions. The key move is defining calibration through reader utility instead of self-reported probability. Hedged text that doesn't help the reader makes no forecasting progress, so generic hedging can't game the objective.

https://benjaminhan.net/posts/20260505-linguistic-calibration/?utm_source=mastodon&utm_medium=social

#LLMs #Calibration #Hallucination #ICML #AI

Linguistic Calibration of Long-Form Generations – synesis

A two-stage recipe (summary-distillation SFT followed by decision-based RL) trains Llama 2 7B to emit long-form text whose confidence phrases let readers make calibrated probabilistic forecasts about downstream questions.

synesis

Conformal Factuality casts LM correctness as uncertainty quantification. Decompose the answer into sub-claims, score each, drop the low-confidence ones until the retained set is ~1-α factual. The sub-claim decomposition is doing most of the work, and the conformal machinery rides on top. Atomic-claim splitters have known failure modes, and the guarantee inherits them.

https://benjaminhan.net/posts/20260505-conformal-factuality/?utm_source=mastodon&utm_medium=social

#ConformalPrediction #Calibration #Hallucination #LLMs #ICML #AI

Language Models with Conformal Factuality Guarantees – synesis

A framework that turns a correctness guarantee for LM outputs into a conformal prediction problem, backing off to less specific claims until the error rate crosses a target threshold.

synesis

🚀Exciting news: 2 papers accepted to #ICML (1 Spotlight ~ top 2.2% of papers)

🌟Unsupervised Partner Design Enables Robust Ad-hoc Teamwork (Spotplight)
https://www.collaborative-ai.org/publications/ruhdorfer26_icml/

MEAL: A Benchmark for Continual Multi-Agent Reinforcement Learning
https://www.collaborative-ai.org/publications/tomilin26_icml/

Huge congrats to the authors!

Collaborative Artificial Intelligence

Our group conducts fundamental research towards collaborative artificial intelligence (CAI) at the intersection of multimodal machine learning, computational cognitive modelling, computer vision, and human-machine interaction.

Position paper: today's self-improving agents lean on extrinsic metacognition — fixed human-designed loops about what to monitor, when to switch strategies. Genuine self-improvement needs the agent itself to decide those.

The intrinsic/extrinsic axis is the right lens for recent agent work. STaR, DSPy, MASS, MetaSPO all extrinsic by this definition. Optimistic bet: current LLMs already carry partial ingredients.

https://benjaminhan.net/posts/20260430-intrinsic-metacognitive-learning/?utm_source=mastodon&utm_medium=social

#LLMs #AI #AgenticSystems #Cambridge #ICML

Truly Self-Improving Agents Require Intrinsic Metacognitive Learning – synesis

A position paper from Cambridge argues that today’s self-improving agents lean on hand-designed meta-loops, and genuine self-improvement needs the agent itself to decide what to evaluate and learn.

synesis
🌖 關於 ICML 論文審查中違反大型語言模型(LLM)使用政策之說明
➤ 捍衛審查信任:ICML 如何透過技術手段查緝 AI 違規使用
https://blog.icml.cc/2026/03/18/on-violations-of-llm-review-policies/
隨著人工智慧融入研究工作流程,ICML 2026 針對審查過程中的 LLM 使用制定了嚴格的規範。為維護審查公平性,大會實施了「保守型(禁用 LLM)」與「寬鬆型(有限度使用)」兩種政策。針對選擇禁用 LLM 的審稿人,ICML 採取了隱蔽式水印技術,成功偵測出並懲處了違規使用 AI 生成審稿意見的行為,共導致 497 篇論文被直接拒收。此舉旨在強調學術誠信與信任的重要性,而非針對審稿品質本身。
+ 雖然技術手段存在被繞過的風險,但針對嚴重依賴 AI 進行「無腦複製貼上」的審稿人,這確實是一記警鐘。學術界確實需要更明確的 AI 使用準則。
+ 對於被拒收論文的作者來說這確實很不幸,但若審稿人連最基本的禁用 AI 協議都無法遵守,這反映出更深層的誠信
#學術誠信 #人工智慧 #同儕審查 #ICML 2026
On Violations of LLM Review Policies – ICML Blog

https://blog.icml.cc/2026/03/18/on-violations-of-llm-review-policies/

This is wild. #ICML let reviewers individually choose whether they want to work under a no-LLM policy or light-LLM-use policy. Those who chose the no-LLM policy received watermarked PDFs with hidden instructions to include specific phrases in LLM output. Using this technique, they caught almost 800 reviews that violated the policy *the reviewers had chosen themselves*! And this was just a conservative detection approach which fails if the reviewer slightly paraphrases parts of the LLM output.

On Violations of LLM Review Policies – ICML Blog

Ah, the prestigious #ICML, bravely tackling the earth-shattering crisis of AI-assisted reviews by rejecting a whopping 2% of papers! 🤖📄 Clearly, the #integrity of peer review hangs by a thread, as program chairs valiantly protect us from the existential threat of Large Language Models daring to assist. 😂 Bravo, ICML, for saving us from this apocalypse!
https://blog.icml.cc/2026/03/18/on-violations-of-llm-review-policies/ #AIreviews #PeerReview #LargeLanguageModels #HackerNews #ngated
On Violations of LLM Review Policies – ICML Blog

khazzz1c (@Imkhazzz1c)

작성자는 두 편의 논문이 ICLR 2026에 채택되었다고 알리며, 다음 목표로 ICML 2026을 언급하고 있습니다. 연구 성과의 학술적 인정과 향후 더 큰 컨퍼런스 도전을 계획하고 있다는 내용입니다.

https://x.com/Imkhazzz1c/status/2016490922498990354

#iclr #icml #research #papers

khazzz1c (@Imkhazzz1c) on X

Two papers have already been accepted by ICLR 2026 — time to aim for ICML 2026 next.

X (formerly Twitter)

This is an interesting policy change regarding author attendance. Much more inclusive, but will authors struggle now to justify their travel expenses? Would be interesting to see how this affects author participation.

From the ICML 2026 CfP https://icml.cc/Conferences/2026/CallForPapers

#icml #conferences

New paper accepted! In which circunstances can we use abundant proxy preferences to quickly learn true preferences? I'm glad to announce our paper explores and proposes a model for one of these cases. Check out more on Yuchen's thread in Bluesky https://bsky.app/profile/zhuyuchen.bsky.social/post/3lo4n2tspys2w . #ICML2025 #ICML
Yuchen Zhu (@zhuyuchen.bsky.social)

New work! 💪🏻💥🤯 When Can Proxies Improve the Sample Complexity of Preference Learning? Our paper is accepted at @icmlconf.bsky.social 2025. Fantastic joint work with @spectral.space, Zhengyan Shi, @meng-yue-yang.bsky.social, @neuralnoise.com, Matt Kusner, @alexdamour.bsky.social. 1/n

Bluesky Social