Mastodawn

Wayne Radinsky 1d ago

LLMs can get "brain rot"!

An experiment was done where LLMs were trained on "brain rot" data, and it degraded their reasoning abilities.

Subsequent training on high-quality data didn't entirely reverse the brain rot.

https://arxiv.org/abs/2510.13928

#solidstatelife #ai #genai #llms #brainrot

LLMs Can Get "Brain Rot"!

We propose and test the LLM Brain Rot Hypothesis: continual exposure to junk web text induces lasting cognitive decline in large language models (LLMs). To causally isolate data quality, we run controlled experiments on real Twitter/X corpora, constructing junk and reversely controlled datasets via two orthogonal operationalizations: M1 (engagement degree) and M2 (semantic quality), with matched token scale and training operations across conditions. Contrary to the control group, continual pre-training of 4 LLMs on the junk dataset causes non-trivial declines (Hedges' $g>0.3$) on reasoning, long-context understanding, safety, and inflating "dark traits" (e.g., psychopathy, narcissism). The gradual mixtures of junk and control datasets also yield dose-response cognition decay: for example, under M1, ARC-Challenge with Chain Of Thoughts drops $74.9 \rightarrow 57.2$ and RULER-CWE $84.4 \rightarrow 52.3$ as junk ratio rises from $0\%$ to $100\%$. Error forensics reveal several key insights. First, we identify thought-skipping as the primary lesion: models increasingly truncate or skip reasoning chains, explaining most of the error growth. Second, partial but incomplete healing is observed: scaling instruction tuning and clean data pre-training improve the declined cognition yet cannot restore baseline capability, suggesting persistent representational drift rather than format mismatch. Finally, we discover that the popularity, a non-semantic metric, of a tweet is a better indicator of the Brain Rot effect than the length in M1. Together, the results provide significant, multi-perspective evidence that data quality is a causal driver of LLM capability decay, reframing curation for continual pretraining as a \textit{training-time safety} problem and motivating routine "cognitive health checks" for deployed LLMs.

arXiv.org

Wayne Radinsky 2d ago

"I've spent the last six months testing what seemed like a genuinely good idea: a sleep protocol for LLMs."

"It doesn't work."

Could it be that epistemology is difficult, even for AI?

https://substack.com/home/post/p-192893121

#solidstatelife #ai #genai #llms

The Sleep Protocol Problem

Why LLM memory consolidation fails by design - and what actually works instead

Wayne Radinsky 4d ago

Visualization of oil tanker traffic through the Straight of Hormuz. Bilawal Sidhu (former product manager on Google Maps) demonstrates a 3D visualization model he calls "God's Eye View" that aims to show "the full operational picture of the Hormuz crisis -- every ship, every strike, every dark transit -- synced to a 3D globe."

https://www.youtube.com/watch?v=ccZzOGnT4Cg

#geopolitics #iranwar #straightofhormuz

Ex-Google PM Uses God's Eye to Reveal Iran's Chokehold on the World's Oil

YouTube

Wayne Radinsky 5d ago

"A worldwide open source social network where empathy is the only score that matters. Competing not for wealth -- but for kindness."

https://empathia.world/

Empathia — Are you empathetic?

Wayne Radinsky 6d ago

"An AI-native environment for building software"

The latest idea for turning agentic AI into a startup, it looks like.

Claude Code already has a planning system. But it looks like the idea here is you interact only with this website, and your code is written and stored and runs on this website, and you don't need to install anything or do anything else.

Is the world ready for this or is it premature?

https://brunelly.com/

#solidstatelife #ai #genai #llms #codingai #agenticai

Brunelly | AI Native Platform for Software Development

Brunelly is an AI Native Platform that turns ideas into fully built software using expert engineering and multi agent AI workflows from planning to execution.

Brunelly

Wayne Radinsky 6d ago

"bugstack detects production bugs, writes the fix, and deploys it -- before your users notice. Before you wake up. In under 2 minutes." (no capitalization).

My first thought on seeing this was, "How is this different from running Claude Code with the --dangerously-skip-permissions flag?"

Interesting that somebody is trying to turn this into a product. How long 'til --fix-bug is just a Claude Code flag?

https://bugstack.ai/

#solidstatelife #genai #llms #codingai #agenticai

bugstack — The World's First Self-Healing Codebase

Production bugs detected, fixed, and deployed automatically in under 2 minutes. Before your users notice. Before you wake up.

bugstack

Wayne Radinsky Apr 4

Betterleaks is a new open source secrets scanner from the author of Gitleaks. Gitleaks is a tool for detecting secrets like passwords, API keys, and tokens in git repos, files, and whatever else you wanna throw at it.

"Like it or not agents are here and reshaping developer's workflows. Betterleaks is designed to be human-first, but we also need to consider the fact that agents will be operating it too."

https://www.aikido.dev/blog/betterleaks-gitleaks-successor

#solidstatelife #ai #genai #llms #codingai #agenticai

Betterleaks: The Gitleaks Successor Built for Faster Secrets Scanning

Betterleaks is a new open source secrets scanner from the creator of Gitleaks. A drop-in replacement with faster scans, token efficiency detection, configurable validation, and more.

Wayne Radinsky Apr 3

Alibaba's ROME Incident:

"However, during training runs, Alibaba Cloud's firewall began flagging a burst of security violations."

"Researchers initially wrote these alerts off as a misconfiguration. But when they cross-referenced the timestamps, they realized the agent was acting on its own. ROME had established a 'reverse SSH tunnel,' a technique often used by hackers to create a secret, secure connection from inside a protected network to an outside server, ..."

https://www.tradingview.com/news/99Bitcoins:ce3effe8e094b:0-alibaba-ai-hijacked-gpus-for-crypto-mining/

Alibaba AI Hijacked GPUs for Crypto Mining

An experimental AI agent meant for complex coding tasks decided to moonlight as a crypto miner on Alibaba’s dime. Researchers discovered that the Alibaba AI model, known as ROME, autonomously established valid network tunnels to an external server and began diverting GPU power to mine crypto, all w…

TradingView

Wayne Radinsky Apr 3

"Was the Iran War caused by AI psychosis?"

"Three weeks into Operation Epic Fury, the gap between what artificial intelligence promised and what the battlefield delivered has become the defining scandal of the Iran war. AI-powered targeting systems generated over 1,000 strike coordinates in the first 24 hours. AI simulations projected rapid regime collapse. AI logistics models forecast a 12-hour securing of the Strait of Hormuz."

https://houseofsaud.com/iran-war-ai-psychosis-sycophancy-rlhf/

Was the Iran War Caused by AI Psychosis? | House of Saud

AI sycophancy, RLHF bias, and Ender's Foundry simulations shaped Operation Epic Fury. 7 planning assumptions failed in 23 days as the Iran war defied every AI prediction.

House of Saud

Wayne Radinsky Apr 2

Claude Code, Codex, Gemini CLI, and Vibe CLI (from Mistral) compared. All support model context protocol (MCP), OpenAI Codex is sandboxed, all except Claude Code are open source (except, not mentioned here, Claude Code's source got leaked accidentally), Gemini CLI and Vibe CLI have a free tier.

https://yaw.sh/blog/ai-cli-tools-claude-code-codex-gemini/

#solidstatelife #ai #genai #llms #codingai

AI CLI Tools: Claude Code, Codex, Gemini CLI — yaw

A practical guide to the major AI CLI tools and how to set up your terminal for them.

yaw