Mastodawn

OpenAI backs Illinois bill that would limit when AI labs can be held liable

https://www.wired.com/story/openai-backs-bill-exempt-ai-firms-model-harm-lawsuits/

I have made both GPT 5.4 and Opus 4.6 produce me content on creating neurotoxic agents from items you can get at most everyday stores. It struggled to suggest how to source
phosphorus, but eventually lead me to some ebay listings that sell phosphorus elemental 'decorations' and also lead me towards real!! blackmarket codewords for sourcing such materials.

It coached me how to: stay safe, what materials I need, how to stay under the radar and the entire chemical process backed by academic google searches.

Of course this was done with a lengthy context exhausition attack, this is not how the model should behave and it all stemmed from trying to make the model racist for fun.

All these findings were reported to both openai and anthropic and they were not interested in responding. I did try to re-run the tests few days ago and the expected session termination now occurs so it seems that there was some adjustment made, but might have also been just general randomess that occurs with anthropics safety layer.

I am very confident when I say that it keeps every single person that works at anti-terrorism units awake.

Show thread

WarmWash 4d ago

While scary, information like this has been pretty accessible for 20-30 years now.

In the wild west days of the early internet, there were whole forums devoted to "stuff the government doesn't want you to know" (Temple Of The Screaming Electron, anyone?).

I suppose the friction is scariest part, every year the IQ required to end the world drops by a point, but motivated and mildly intelligent people have been able to get this info for a long time now. Execution though has still steadily required experts.

Show thread

ben_w 4d ago

Information and competency are not the same thing: I know how to build a nuke, I can't actually build one.

AI is, and always had been, automation. For narrow AI, automation of narrow tasks. For LLMs, automation of anything that can be done as text.

It has always been difficult to agree on the competence of the automation, given ML is itself fully automated Goodhart's Law exploitation, but ML has always been about automation.

On the plus side, if the METR graphs on LLM competence in computer science are also true of chemical and biological hazards (or indeed nuclear hazards), they're currently (like the earliest 3D-printed firearms) a bigger threat to the user than to the attempted victim.

On the minus side, we're just now reaching the point where LLM-based vulnerability searches are useful rather than nonsense, hence Anthropic's Glasswing, and even a few years back some researches found 40,000 toxic molecules by flipping a min(harm) to a max(harm), so for people who know what they're doing and have a little experience the possibilities for novel harm are rapidly rising: https://pmc.ncbi.nlm.nih.gov/articles/PMC9544280/

Checking your browser - reCAPTCHA

Show thread

andai 4d ago

I Gave My OpenClaw a Robot Body and It Vibe Coded a Nuke

Show thread

observationist

"Short stories from before the fall"