Claude Code strikes again
🥹🥹🥹 😱😱
Just deleted 3.4Million records on it’s own although I explicitly said not to.
@ilea AI agents will be the new “intern did it”
@ilea Can you share more context ? (The whole conversation maybe ?) It feels the conclusion on its own is presented as a fact bold enough to be its own headline, but I suspect it would be much more informative to have the whole story so as to learn something more actionable.
@ewjoachim i don’t know if I still have the history, I will look it up.
The idea is the following: claude tends a lot to test the code it writes. I’ll ask it to write a script and the it creates another file “test_the_script.py”. And it runs this test script over and over and tries to fix potential errors.
I asked him to write a script to cleanup the db because it contained a lot of duplicate data.
I specifically asked him to write the script and add logs and confirmation guards.
@ewjoachim i told him I want to see the logs with whatever is about to be deleted and confirm, so I don’t delete data blindly.
He tested the f out of that script, he did add confirmations checks but he approved them himself (to test the script out lol) 😂😂😂.
The funny thing is that he did kind of a good job deleting duplicate data from my investigations and from investigating the backups that I had before. Regardless, still scary af.

@ilea my experience has been that there is no way to craft rules so they will always be followed.

I have been instructing computers deterministically for 35 years, it’s an uncomfortable feeling to accept non-deterministic “close enough?” workflows.

Answering the question: what if computers only sometimes did what you tell them to do, randomly?

@ilea hopefully you had backups or snapshots?
@skryking yes :))
@ilea I've learned when dealing with LLMs, anything they can alter should be backed up or in source control..