Claude mixes up who said what and that's not OK
https://dwyer.co.za/static/claude-mixes-up-who-said-what-and-thats-not-ok.html
Claude mixes up who said what and that's not OK
https://dwyer.co.za/static/claude-mixes-up-who-said-what-and-thats-not-ok.html
> This class of bug seems to be in the harness, not in the model itself. It’s somehow labelling internal reasoning messages as coming from the user, which is why the model is so confident that “No, you said that.”
Are we sure about this? Accidentally mis-routing a message is one thing, but those messages also distinctly "sound" like user messages, and not something you'd read in a reasoning trace.
I'd like to know if those messages were emitted inside "thought" blocks, or if the model might actually have emitted the formatting tokens that indicate a user message. (In which case the harness bug would be why the model is allowed to emit tokens in the first place that it should only receive as inputs - but I think the larger issue would be why it does that at all)