Mastodawn

Claude mixes up who said what and that's not OK

https://dwyer.co.za/static/claude-mixes-up-who-said-what-and-thats-not-ok.html

Claude mixes up who said what, and that's not OK

Claude sometimes sends messages to itself and then thinks those messages come from the user. This is categorically distinct from hallucinations or missing permissions.

Show thread

Latty 1d ago

Everything to do with LLM prompts reminds me of people doing regexes to try and sanitise input against SQL injections a few decades ago, just papering over the flaw but without any guarantees.

It's weird seeing people just adding a few more "REALLY REALLY REALLY REALLY DON'T DO THAT" to the prompt and hoping, to me it's just an unacceptable risk, and any system using these needs to treat the entire LLM as untrusted the second you put any user input into the prompt.

Show thread

cookiengineer

Before 2023 I thought the way Star Trek portrayed humans fiddling with tech and not understanding any side effects was fiction.

After 2023 I realized that's exactly how it's going to turn out.

I just wish those self proclaimed AI engineers would go the extra mile and reimplement older models like RNNs, LSTMs, GRUs, DNCs and then go on to Transformers (or the Attention is all you need paper). This way they would understand much better what the limitations of the encoding tricks are, and why those side effects keep appearing.

But yeah, here we are, humans vibing with tech they don't understand.