Mastodawn

https://simonwillison.net/2026/Jun/11/anthropic-walks-back-policy/ "We made the wrong tradeoff and we apologize for not getting the balance right." I think someone at Anthropic had to snip out "Honestly," from the statement before sending it over.

Anthropic Walks Back Policy That Could Have ‘Sabotaged’ AI Researchers Using Claude

Big scoop for Maxwell Zeff at Wired: “We’re changing Fable 5’s safeguards for frontier LLM development to make them visible.” Anthropic said in a statement to WIRED. “We made the …

Simon Willison’s Weblog

Manoj Kasichainula 3d ago

Damien Miller 3d ago

Show thread

Manoj Kasichainula 3d ago

To be clear, I don't see much if any information on how this works still, so Apple devs could surprise me with their cleverness (or my lack) as I learn more.

Show thread

Manoj Kasichainula 3d ago

@freddy Heh yes. It could be marketing puffery and not use any ML, but I haven't figured out how it could do what they say reliably in that case, either. If it's just a collection of hardcoded heuristics, I'd have different worries. (I haven't decided if lesser or greater yet, because the worries aren't fully formed.)

Show thread

Manoj Kasichainula 3d ago

@freddy Ooh, I didn't know about this, thanks!

I'd guess the Apple Intelligence here would be
- to know how to navigate that well-known link, which uhh *probably* wouldn't have user-generated content? (And if keeping UGC out wasn't a good enough practice before, it may well be now!)
- to work on sites that don't support that well-known URL. I assume this is a lot of sites. The first site I tried did not support it.

If not, there'd be no need for "AI", after all. So I don't think that's enough.

Show thread

Manoj Kasichainula 3d ago

With a bit less jargon: On some sites, Apple's agent might need to read pages full of user-generated text to find the "change password" link. The text could trick Apple's agent into letting an attacker hijack your account.

If Apple wasn't careful, the "confusion" could even spread to other sites.

Show thread

Manoj Kasichainula 3d ago

@tychotithonus I'm not even sure that's enough. What if the prompt injection just tells the bot to change all users' passwords to [dGhpcyBpcyBhIGJhZCBwYXNzd29yZAo=]? How is a user supposed to understand that this is bad?

Besides that, I worry that going into too much detail and requiring confirmation at each step would be slower than just doing it manually or be so verbose that most users just repeatedly click "OK".

Show thread

Manoj Kasichainula 3d ago

@tychotithonus Yeah, I'm pondering for ways this could be done safely (without just cheating and using not-actually-machine-learning for this), and so far I'm failing.

Manoj Kasichainula 3d ago

ahahaha uhhhhh, I'd *like* to think smart security people at Apple were on this and cut off one of the legs of the lethal trifecta, but uhhhhh... https://www.macrumors.com/2026/06/08/apple-passwords-can-now-automatically-fix-passwords-with-agentic-ai/

Apple Passwords Can Now Automatically Fix Weak and Compromised Passwords With Agentic AI

Apple today announced that the Passwords app can now automatically update weak and compromised passwords using Apple Intelligence and Safari to take...

MacRumors

Show thread

Manoj Kasichainula Jun 2

@lcamtuf So does The Joker.

Bluesky	https://headmold.bsky.social
Profile photo by	@bdowney