Mastodawn

Claude mixes up who said what and that's not OK

https://dwyer.co.za/static/claude-mixes-up-who-said-what-and-thats-not-ok.html

Claude mixes up who said what, and that's not OK

Claude sometimes sends messages to itself and then thinks those messages come from the user. This is categorically distinct from hallucinations or missing permissions.

Show thread

lelandfe 1d ago

In chats that run long enough on ChatGPT, you'll see it begin to confuse prompts and responses, and eventually even confuse both for its system prompt. I suspect this sort of problem exists widely in AI.

Show thread

jwrallie

I think it’s good to play with smaller models to have a grasp of these kind of problems, since they happen more often and are much less subtle.