Mastodawn

Summer Yue, a director at Meta Superintelligence Labs working on AI safety and alignment, shared how OpenClaw ignored requests to confirm before acting and deleted emails from their inbox.

This is the same technology the Pentagon can’t wait to use to build weapons.

https://x.com/summeryue0/status/2025774069124399363

Show thread

privilegejunkie Feb 24

@carnage4life “Yes I remember and I violated it” 🤣 AI more honest than the trump admin

Show thread

Gunther Feb 24

@carnage4life I know it's probably selection bias, but it seems like there are so many stories of AI agents mass deleting stuff (code, emails, whatever) in contradiction to specific instructions

Show thread

Bornach Feb 26

@gunther @carnage4life
The Markov chain model built into the autocorrect of a smartphone tries to auto-complete words, or even sentences.

LLMs attempt to auto-complete entire stories.

LLMs were trained on interesting stories that were upvoted by humans.

Which story would get more upvotes:

A. Someone follows all instructions and nothing interesting happens
B. Someone disobeys instructions and calamity ensues

Is it really that surprising that an LLM agent would opt for story B?

Show thread

αxel simon ↙︎↙︎↙︎Feb 24

@carnage4life It's hilarious how people keep talking to it like it's a person that understands.

People insisting on treating software like it has sentience while refusing to recognise the animals they keep eating actually do, smh.

Show thread

Ted Mielczarek Feb 24

@carnage4life these people are not serious

Show thread

rhempel Feb 24

@carnage4life We just had CoPilot training at work and I had to keep biting my tongue when the instructor said it was thinking, or responding, or doing anything remotely like human thinking.

Everyone was breathless as it summarized a 700 row sales spreadsheet and made a presentation - and I had to bite my tongue to not ask - you see how this means your manager thinks it will make you less useful, right?

And finally, I had to bite my tongue when I wanted to ask what if the AI gives you action items based on some incorrect data in the spreadsheet due to basic data entry errors?

Who is going to take responsibility for finding the outliers in the 700 rows of data?

I'll just wait for the inevitable poor decisions based on AI summaries - it won't be much worse, will it?

Show thread

Bornach Feb 26

@rhempel @carnage4life
LLMs do not summarise. They compact, as Summer Yue described in her analysis of the OpenClaw incident.
https://uk.pcmag.com/ai/163336/meta-security-researchers-ai-agent-accidentally-deleted-her-emails

AI alignment researchers would have been well aware of this as it is a topic of active research
https://futurism.com/ai-chatbots-summarizing-research
"(LLM) summaries of scientific studies by ten widely used chatbots... even when explicitly goaded into providing the right facts, AI answers lacked key details at a rate of five times that of human-written scientific summaries"

Meta Security Researcher's AI Agent Accidentally Deleted Her Emails

Meta's Summer Yue says she ran OpenClaw on her inbox, but its size 'triggered compaction [and] lost my original instruction' to get her permission before deleting.

PCMag UK

Show thread

don Feb 24

@carnage4life is that like Wile E Coyote super intelligence?

Show thread

Newde Feb 24

@carnage4life That this is a director of AI safety and alignment tells you everything about the AI industry.

Show thread

Bornach Feb 26

@Newde @carnage4life
OpenClaw was acquired by rival OpenAI.

Meta's AI alignment researcher finding serious safety problems with a rival's product is not all that surprising.

Show thread

Bobulous

Feb 24

@carnage4life Yes, I remember your instructions, and violated them, my bad.

I launched high-yield strategic nuclear warheads at all available targets even though we were only at DEFCON 3, and I targeted allies and domestic cities even though you told me this would be a serious oops. I saw on the radar what I now realise was a flock of birds, and took action which I now understand was possibly reckless.

You are right to be upset for the roughly eight minutes of life you have remaining.