New post: AI: The robustness imperative

If AI is the next infrastructure layer, the real question is not just capability.
Itโ€™s robustness: ownership, open models, digital sovereignty, and resilience under stress.

I connect Olivier Hamantโ€™s biology lens to AI policy, economics, and commons governance.

https://radicaloptimist.org/en/post/ai-the-robustness-imperative/

#AI #Robustness #OpenSource #DigitalSovereignty #Commons #PoliticalEconomy

AI: The robustness imperative

The AI ecosystem contains genuine robustness signals. It is also being systematically pushed toward fragility by the optimization logic of the installation period. Through Olivier Hamant's biological framework, a path toward cognitive independence โ€” for individuals, organizations, and states โ€” becomes visible. It requires treating open infrastructure as a commons, ownership as a political act, and digital sovereignty as a precondition for everything else.

Radical Optimist

"We then survey statistical lower bounds that, we argue, constitute a compelling case against the possibility of designing high-accuracy LAIMs with strong security guarantees."

On the Impossible Safety of Large AI Models
https://arxiv.org/abs/2209.15259

#generativeAI #chatBots #LLMs #safety #personalSafety #robustness #genAI #accuracy

On the Impossible Safety of Large AI Models

Large AI Models (LAIMs), of which large language models are the most prominent recent example, showcase some impressive performance. However they have been empirically found to pose serious security issues. This paper systematizes our knowledge about the fundamental impossibility of building arbitrarily accurate and secure machine learning models. More precisely, we identify key challenging features of many of today's machine learning settings. Namely, high accuracy seems to require memorizing large training datasets, which are often user-generated and highly heterogeneous, with both sensitive information and fake users. We then survey statistical lower bounds that, we argue, constitute a compelling case against the possibility of designing high-accuracy LAIMs with strong security guarantees.

arXiv.org

#statstab #519 Sensitivity Analysis and Robustness Checks

Thoughts: Multiverse analysis allows you to capture decision uncertainty.

#multiverse #jackknife #bias #robustness #sensitivity #r #outliers

https://mike-data-analysis.share.connect.posit.cloud/sensitivity-analysis-and-robustness-checks.html

Chapter 42 Sensitivity Analysis and Robustness Checks | A Guide on Data Analysis

โ€œDo robustness checks until you believe the result.โ€ โ€” the author, having repeatedly experienced the brief joy of promising results only to watch them collapse under closer inspection, and now...

Chubby (@kimmonismus)

Peter Gostev๊ฐ€ ๋งŒ๋“  BullshitBench v2๋Š” ๊ธฐ์กด ๋ฒค์น˜๋งˆํฌ์™€ ๋‹ฌ๋ฆฌ AI ๋ชจ๋ธ์ด ๋ง๋„ ์•ˆ ๋˜๋Š”(๋ฌด์˜๋ฏธํ•œ) ํ”„๋กฌํ”„ํŠธ๋ฅผ ๊ฒ€์ถœํ•ด ๊ฑฐ๋ถ€ํ•  ์ˆ˜ ์žˆ๋Š”์ง€๋ฅผ ํ…Œ์ŠคํŠธํ•ฉ๋‹ˆ๋‹ค. ํ•ด๋‹น ๋ฒค์น˜์—์„œ Anthropic์˜ Claude ๊ณ„์—ด๊ณผ Alibaba์˜ Qwen 3.5๋งŒ์ด ์ ์ˆ˜๋ฅผ ๋ƒˆ๋‹ค๋Š” ๊ฒฐ๊ณผ๋ฅผ ์•Œ๋ฆฌ๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

https://x.com/kimmonismus/status/2029230388028358726

#benchmark #aisafety #robustness #anthropic #qwen

Chubbyโ™จ๏ธ (@kimmonismus) on X

BullshitBench v2, created by Peter Gostev, is a benchmark that does something refreshingly different: it tests whether AI models can detect and reject nonsensical prompts instead of confidently rolling with them. Only Anthropic's Claude models and Alibaba's Qwen 3.5 score

X (formerly Twitter)

AssemblyAI (@AssemblyAI)

Universal-3 Pro Streaming์„ ๋‰ด์š• ์ง€ํ•˜์ฒ ์—์„œ ํ…Œ์ŠคํŠธํ•ด '์ง€ํ•˜์ฒ ์—์„œ๋„ ๋ฌธ์ œ์—†๋‹ค(subway-proof)'๋Š” ๊ฒฐ๊ณผ๋ฅผ ๋ณด์˜€์Šต๋‹ˆ๋‹ค. ์ด๋™ ์ค‘ ์‹ค์‚ฌ์šฉ ํ™˜๊ฒฝ์—์„œ์˜ ์ŠคํŠธ๋ฆฌ๋ฐ/์ถ”๋ก  ๊ฒฌ๊ณ ์„ฑ ๋ฐ ์ €์ง€์—ฐ ์„ฑ๋Šฅ์„ ๊ฐ•์กฐํ•˜๋Š” ์‚ฌ๋ก€์ž…๋‹ˆ๋‹ค.

https://x.com/AssemblyAI/status/2029227606776967451

#universal3 #streaming #robustness #edgeai

AssemblyAI (@AssemblyAI) on X

We took Universal-3 Pro Streaming out for a spin in the New York subway Spoiler: it's subway-proof ๐Ÿ˜Ž

X (formerly Twitter)
The Power Of Using A Story For Better Data Comprehension And Hence Decision Making
--
https://doi.org/10.1080/15228053.2021.2016151 <-- shared book review, โ€œData Story: Explain Data And Inspire Action Through Storyโ€
--
[I encountered this excellent graphic from @saurabh Rai, and went and explored the ideas put so succinctly here; I found, well, a technical story overview (link above) to โ€˜matchโ€™; however, this should not be considered an endorsement of this book]
#data #storytelling #data #comprehension #presentation #story #frameworks #context #setting #dataquality #communication #usecase #robustness #insights #correctness #decisionmaking #narratives #decisions

fly51fly (@fly51fly)

๋…ผ๋ฌธ 'Consistency of Large Reasoning Models Under Multi-Turn Attacks' ๋ฐœํ‘œ(Y Li, R Krishnan, R Padman, CMU, 2026). ๋‹ค์ค‘ ํ„ด ๊ณต๊ฒฉ ์ƒํ™ฉ์—์„œ ๋Œ€ํ˜• ์ถ”๋ก  ๋ชจ๋ธ์˜ ์ผ๊ด€์„ฑ(consistency) ๋ฌธ์ œ๋ฅผ ๋ถ„์„ยท๋ณด๊ณ ํ•˜๋Š” ์—ฐ๊ตฌ ๋…ผ๋ฌธ์œผ๋กœ, ๋ชจ๋ธ์˜ ๊ณต๊ฒฉ ๋‚ด์„ฑ ๋ฐ ์•ˆ์ •์„ฑ ๊ด€๋ จ ์ธ์‚ฌ์ดํŠธ๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค(์›๋ฌธ ๋งํฌ ํฌํ•จ).

https://x.com/fly51fly/status/2023583155425583127

#robustness #reasoningmodels #adversarial #arxiv

fly51fly (@fly51fly) on X

[LG] Consistency of Large Reasoning Models Under Multi-Turn Attacks Y Li, R Krishnan, R Padman [CMU] (2026) https://t.co/6nwEU2mzrp

X (formerly Twitter)

fly51fly (@fly51fly)

๋…ผ๋ฌธ 'HalluGuard'๋Š” LLM์˜ ํ™˜๊ฐ(hallucination)์„ ๋ฐ์ดํ„ฐ ๊ธฐ๋ฐ˜๊ณผ ์ถ”๋ก  ๊ธฐ๋ฐ˜์œผ๋กœ ๊ตฌ๋ถ„ยท๋ถ„์„ํ•˜๊ณ  ๊ฐ ์œ ํ˜•์˜ ์›์ธ๊ณผ ์™„ํ™”์ฑ…์„ ๋ฐํžˆ๋Š” ์—ฐ๊ตฌ์ž…๋‹ˆ๋‹ค. Virginia Tech, MIT, Dartmouth ๊ณต๋™์—ฐ๊ตฌ๋กœ, ํ™˜๊ฐ ํ˜„์ƒ ์ดํ•ด ๋ฐ ๋ฐฉ์ง€ ๊ธฐ๋ฒ•(HalluGuard)์„ ์ œ์•ˆํ•˜๊ณ  ์‹คํ—˜์  ๊ทผ๊ฑฐ๋ฅผ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค.

https://x.com/fly51fly/status/2016281139284213873

#hallucination #llm #robustness #analysis

fly51fly (@fly51fly) on X

[LG] HalluGuard: Demystifying Data-Driven and Reasoning-Driven Hallucinations in LLMs X Zeng, J Lin, Y Yan, F Guo... [Virginia Tech & MIT & Dartmouth College] (2026) https://t.co/qkgVowV7KC

X (formerly Twitter)

fly51fly (@fly51fly)

Huazhong University ์—ฐ๊ตฌ์ง„(X. Zhang ๋“ฑ)์€ '๋…ผ๋ฆฌ์  ์ƒ์ „์ด(Logical Phase Transitions)'๋ผ๋Š” ๊ฐœ๋…์„ ์ œ์‹œํ•˜๋ฉฐ LLM์˜ ๋…ผ๋ฆฌ ์ถ”๋ก ์—์„œ ๋ฐœ์ƒํ•˜๋Š” ๋ถ•๊ดด(collapse)๋ฅผ ์ดํ•ดํ•˜๋Š” ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. ํŠน์ • ์กฐ๊ฑด์—์„œ ์ถ”๋ก  ์„ฑ๋Šฅ์ด ๊ธ‰๊ฒฉํžˆ ์•…ํ™”๋˜๋Š” ์ž„๊ณ„ ํ˜„์ƒ์„ ๋ถ„์„ํ•˜๊ณ  ๋ชจ๋ธ์˜ ์•ˆ์ •์„ฑ๊ณผ ๊ฒฌ๊ณ ์„ฑ์„ ๊ฐœ์„ ํ•  ๋ฐฉ๋ฒ•์„ ๋…ผ์˜ํ•ฉ๋‹ˆ๋‹ค (arXiv:2601.02902).

https://x.com/fly51fly/status/2013727971320750198

#llm #logicalreasoning #phasetransition #robustness

fly51fly (@fly51fly) on X

[CL] Logical Phase Transitions: Understanding Collapse in LLM Logical Reasoning X Zhang, Y Zhang, Z Chen, J Yu... [Huazhong University of Science and Technology] (2026) https://t.co/Jf09jJZNPP

X (formerly Twitter)