Mastodawn

OpenAI backs Illinois bill that would limit when AI labs can be held liable

https://www.wired.com/story/openai-backs-bill-exempt-ai-firms-model-harm-lawsuits/

I have made both GPT 5.4 and Opus 4.6 produce me content on creating neurotoxic agents from items you can get at most everyday stores. It struggled to suggest how to source
phosphorus, but eventually lead me to some ebay listings that sell phosphorus elemental 'decorations' and also lead me towards real!! blackmarket codewords for sourcing such materials.

It coached me how to: stay safe, what materials I need, how to stay under the radar and the entire chemical process backed by academic google searches.

Of course this was done with a lengthy context exhausition attack, this is not how the model should behave and it all stemmed from trying to make the model racist for fun.

All these findings were reported to both openai and anthropic and they were not interested in responding. I did try to re-run the tests few days ago and the expected session termination now occurs so it seems that there was some adjustment made, but might have also been just general randomess that occurs with anthropics safety layer.

I am very confident when I say that it keeps every single person that works at anti-terrorism units awake.

Show thread

jstummbillig 6d ago

> I am very confident when I say that it keeps every single person that works at anti-terrorism units awake.

Wow, that's quite the statement about the excellency of our institutions. Does not seem likely but, what the hell, I'll take my oversized dose of positivity for today!

Show thread

ben_w

The USA isn't the only country with anti-terrorism units, so there's plenty of room for systematic-US-incompetence at the same time as everyone else being diligent and working hard on… well, everything.