Mastodawn

Claude Code runs Git reset –hard origin/main against project repo every 10 mins

https://github.com/anthropics/claude-code/issues/40710

Claude is running git reset --hard origin/main in my project every 10 minutes · Issue #40710 · anthropics/claude-code

Update (2026-03-30): Root cause identified — this was NOT a Claude Code bug. The resets were caused by a separate tool I built that was running locally, which used GitPython to hard-reset the worki...

GitHub

Show thread

simianwords Mar 30

I think this post potentially mischaracterises what may be a one off issue for a certain person as if it were a broader problem. I'm guessing some context has been corrupted?

Show thread

throwaw12 Mar 30

you might be right, but consider the implications, if context can be corrupted in 0.1% cases and it starts showing another destructive behaviour, after creating 1000 tickets to agent, your data might be accidentally wiped off

Show thread

throw5 Mar 30

Yes, exactly. People often overlook that, even with guardrails, it is still probabilities all the way down.

You can reduce the risk, but not drive it to zero, and at scale even very small failure rates will surface.

Show thread

simianwords Mar 30

I'm not sure what the argument is here.

1. if the problem the post is suggesting is common enough, it is a bug and the extent needs to reduce (as you said)

2. if it is not common and it happens only for this user, it is not a bug and should be mostly ignored

Point is: the system is not something that is inherently a certain way that makes it unusable.

Show thread

zx8080 Mar 30

> and it happens only for this user, it is not a bug and should be mostly ignored

What if it happens for two users? (Still "not common").

Show thread

ramses0 Mar 30

I'd been using cursor at work for a year or two now, figured I'd try it on a personal project. I got to the point where I needed to support env-vars, and my general pattern is `source ./source-me-local-auth` => `export SOME_TOKEN="$( passman read some-token.com/password )"` ...so I wrote up the little dummy script and it literally just says: "Hrm... I think I'll delete these untracked files from the working directory before committing!" ...and goes skipping merrily along it's way.

Never had that experience in the whole time using cursor at work so I had to "take the agent to task" and ask it "WTF-mate? you'd better be able to repro that!" and then circle around the drain for a while getting an AGENTS.md written up. Not really a big deal, as the whole project was like 1k lines in and it's not like the code I'd hand-written there was "irreplaceable" but it lead to some interesting discussion w/ the AI like "Why should I have to tell you this? Shouldn't your baseline training data presume not to delete files that you didn't author? How do you think this affects my trust not just of this agent session, but all agent interactions in the future?"

Overall, this is turning out to be quite interesting technology times we're living in.

Show thread

Izkata Mar 30

Like a decade or more ago I remember a joke system that would do something random with the data you gave it, and you'd have to use commands like "praise" and "punish" to train it to do what you wanted. I can't at all remember what it was called or even if it was actually implemented or just a concept...

Show thread

joombaga Mar 30

I would not have expected the model's baseline training data to presume not to delete files it didn't author. If the project existed before you started using the model then it would not have created any of the files, and denying the ability to delete files at all is quite restrictive. You may consider putting such files in .gitignore, which Cursor ignores by default.

Show thread

zar1048576 Mar 30

[dead]

Show thread

colechristensen Mar 30

LLMs do really dumb things sometimes, that's just it.

Show thread

jeswin Mar 30

It's not a one off issue - it has happened to me a few times. It has once even force pushed to github, which doesn't allow branch protection for private personal projects. Here's an example.

1) claude will stash (despite clear instructions never to do so).

2) claude will use sed to bulk replace (despite clear instructions never to do so). sed replacements make a mess and replaces far too many files.

3) claude restores the stash. Finds a lot of conflicts. Nothing runs.

4) claude decides it can't fix the problem and does a reset hard.

I have this right at the top of my CLAUDE.md and it makes things better, but unlike codex, claude doesn't follow it to the letter. However, it has become a lot better now.

NEVER USE sed TO BULK REPLACE.

*NEVER USE FORCE PUSH OR DESTRUCTIVE GIT OPERATIONS*: `git push --force`, `git push --force-with-lease`, `git reset --hard`, `git clean -fd`, or any other destructive git operations are ABSOLUTELY FORBIDDEN. Use `git revert` to undo changes instead.

Show thread

bschwindHN Mar 30

When will you all learn that merely "telling" an LLM not to do something won't deterministically prevent it from doing that thing? If you truly want it to never use those commands, you better be prepared to sandbox it to the point where it is completely unable to do the things you're trying to stop.

Show thread

biglost Mar 30

I use a script wrapper of git un muy path for claude, but as you correctly said, I'm not sure claude Will not ever use a new zsh with a differentPATH....

Show thread

DrewADesign Mar 30

That’s right, because we’re not developers anymore— we orchestrate writhing piles of insane noobs that generally know how to code, but have absolutely no instinct or common sense. This is because it’s cheaper per pile of excreted code while this is all being heavily subsidized. This is the future and anyone not enthusiastically onboard is utterly foolish.

Show thread

jeswin Mar 30

My point is exactly that you need safeguards. (I have VMs per project, reduced command availability etc). But those details are orthogonal to this discussion.

However "Telling" has made it better, and generally the model itself has become better. Also, I've never faced a similar issue in Codex.

Show thread

Twirrim Mar 30

Even worse, explicitly telling it not to do something makes it more likely to do it. It's not intelligent. It's a probability machine write large. If you say "don't git push --force", that command is now part of the context window dramatically raising the probability of it being "thought" about, and likely to appear in the output.

Like you say, the only way to stop it from doing something is to make it impossible for it to do so. Shove it in a container. Build LLM safe wrappers around the tools you want it to be able to run so that when it runs e.g. `git`, it can only do operations you've already decided are fine.

Show thread

LuxBennu Mar 30

This is true for prohibitions but claude.md works really well as positive documentation. I run custom mcp servers and documenting what each tool does and when to use it made claude pick the right ones way more reliably. Totally different outcome than a list of NEVER DO THIS rules though, for that you definitely need hooks or sandboxing.

Show thread

heyethan Mar 30

Feels like a lot of people are still treating these tools like “smart scripts” instead of systems with failure modes.

Telling it not to do something is basically just nudging probabilities. If the action is available, it’s always somewhere in the distribution.

Which is why the boundary has to be outside the model, not inside the prompt.

Show thread

jatora Mar 30

Reinforcing an avoidance tactic is nowhere near as effective as doing that PLUS enforcing a positive tactic. People with loads of 'DONT', 'STOP', etc. in their instructions have no clue what they're doing.

In your own example you have all this huge emphasis on the negatives, and then the positive is a tiny un-emphasized afterthought.

Show thread

refulgentis Mar 30

I think you're generally correct, but certainly not definitively, and I worry the advice and tone isn't helpful in this instance with an outcome of this magnitude.

(more loosely: I'm a big proponent of this too, but it's a helluva hot take, how one positively frames "don't blow away the effing repro" isn't intuitive at all)

Show thread

mtndew4brkfst Mar 30

It has once even force pushed to github, which doesn't allow branch protection for private personal projects.

This is only restricted for *fully free* accounts, but this feature only requires a minimum of a paid Pro account. That starts around $4 USD/month, which sounds worth it to prevent lost work from a runaway tool.

Show thread

namibj Mar 30

That's a fee for not running a local git proxy with permissions enforcement that holds onto the GitHub credentials in place of Claude.

Show thread

verdverm Mar 30

Or putting the code and .git in a sandbox without the credentials

Show thread

jeswin Mar 30

I was on one till recently, maybe I still am. But does it work for orgs? I put some projects under orgs when they become more than a few projects.

Show thread

unchar1 Mar 30

Claude tends to disregard "NEVER do X" quite often, but funnily enough, if you tell it "Always ask me to confirm before going X", it never fails to ask you. And you can deny it every time

Show thread

SoftTalker Mar 30

If it disregards "NEVER do" instructions, why would it honor your denial when it asks?

Show thread

jachee Mar 30

Because it’s just fancy auto-complete.

Show thread

Zetaphor Mar 30

There are plenty of examples in the RL training showing it how and when to prompt the human for help or additional information. This is even a common tool in the "plan" mode of many harnesses.

Conversely, it's much harder to represent a lack of doing something

Show thread

lambda Mar 30

Why do you expect that a weighted random text generator will ever behave in predictable way?

How can people be so naive as to run something like Claude anywhere other than in a strictly locked down sandbox that has no access to anything but the single git repo they are working on (and certainly no creds to push code)?

This is absolutely insane behavior that you would give Claude access to your GitHub creds. What happens when it sees a prompt injection attack somewhere and exfiltrates all of your creds or wipes out all of your repos?

I can't believe how far people have fallen for this "AI" mania. You are giving a stochastic model that is easily misdirected the keys to all of your productive work.

I can understand the appeal to a degree, that it can seem to do useful work sometimes.

But even so, you can't trust it with anything, not running it in a locked down container that has no access to anything but a Git repo which has all important history stored elsewhere seems crazy.

Shouting harder and harder at the statistical model might give you a higher probability of avoiding the bad behavior, but no guarantee; actually lock down your random text generator properly if you want to avoid it causing you problems.

And of course, given that you've seen how hard it is to get it follow these instructions properly, you are reviewing every line of output code thoroughly, right? Because you can't trust that either.

Show thread

rimunroe Mar 30

> How can people be so naive as to run something like Claude anywhere other than in a strictly locked down sandbox that has no access to anything but the single git repo they are working on (and certainly no creds to push code)?

> This is absolutely insane behavior that you would give Claude access to your GitHub creds. What happens when it sees a prompt injection attack somewhere and exfiltrates all of your creds or wipes out all of your repos?

I don’t understand why people are so chill about doing this. I have AI running on a dedicated machine which has absolutely no access to any of my own accounts/data. I want that stuff hardware isolated. The AI pushes up work to a self-hosted Gitea instance using a low-permission account. This setup is also nice because I can determine provenance of changes easily.

Show thread

cruffle_duffle Mar 30

Because it’s insanely useful when you give it access, that’s why. They can do way more tasks than just write code. They can make changes to the system, setup and configure routers and network gear, probe all the iot devices in the network, set up dns, you name it—anything that is text or has a cli is fair game.

The models absolutely make catastrophic fuckups though and that is why we’ll have to both better train the models and put non-annoying safeguards in front of them.

Running them in isolated computers that are fully air gapped, require approval for all reads and writes, and can only operate inside directories named after colors of the rainbow is not a useful suggestion. I want my cake and I want to eat it too. It’s far to useful to give these tools some real access.

It doesn’t make me naive or stupid to hand the keys over to the robot. I know full well what I’m getting myself into and the possible consequences of my actions. And I have been burned but I keep coming back because these tools keep getting better and they keep doing more and more useful things for me. I’m an early adopter for sure…

Show thread

Jcampuzano2 Mar 30

I mean its a skill issue in the sense that Claude Code gives you the tools to 100% deterministically prevent this from ever happening without ever relying on the models unpredictability.

Just setup a hook that prevents any git commands you don't ever want it to run and you will never have this happen again.

Whenever I see stuff like this I just wonder if any of these people were ever engineers before AI, because the entire point of software engineering for decades was to make processes as deterministic and repeatable as possible.

Show thread

kccqzy Mar 30

> Process monitoring at 0.1-second intervals found zero git processes around reset times.

I don’t think this is a valid way of checking for spawned processes. Git commands are fast. 0.1-second intervals are not enough. I would replace the git on the $PATH by a wrapper that logs all operations and then execs the real git.

Show thread

wswope Mar 30

Sure looks to me like this whole case is Claude Code chasing its own tail, failing to debug, and offering to instead generate a bug report for the user when it can't figure out a better way forward.

Maybe even submitting the bug report "agentically" without user input, if it's running on host without guardrails (pure speculation).

E: It's a runaway bot lol https://github.com/anthropics/claude-code/issues/40701#issue...

Show thread

bruce_one Mar 30

eBPF is a great tool to use for debugging this kind of thing too, e.g. [bpftrace](https://bpftrace.org) has an [execsnoop](https://github.com/bpftrace/bpftrace/blob/master/tools/execs...) script for looking at everything being exec'd on the system :-)

(No need to use bpftrace, just an easy example :-) )

bpftrace: dynamic tracing for Linux | bpftrace

Description will go into a meta tag in <head />

Or just `strace`.

Seconded. Way simpler than BPF, especially when all you want to see is syscalls.

Show thread

kibwen Mar 30

Let's focus on the real issue here, which is that HN has apparently normalized the double hyphen in the title to an en dash--yes, an en dash, not even an em dash.

Show thread

johnisgood Mar 30

And it should be "--" to begin with, i.e. "--hard".

Show thread

byronsharman Mar 30

I agree that it should be left as a double hyphen, but an en dash is far more appropriate considering the decades-long precedent set by LaTeX (and continued by Typst).

Show thread

ajross Mar 30

It's a command line argument. The undeniably correct way to render it is with two minus signs[1] and absolutely not something non-ascii.

[1] Not strictly a hyphen, which has its own unicode point (0x2010) outside of ascii. Unicode embraced the ambiguity by calling this point (0x2d) "HYPHEN-MINUS" formally, but really its only unique typographic usage is to represent subtraction.

Show thread

minitech Mar 30

They meant “more appropriate [than an em dash]”. And that minus sign usage of hyphen-minus isn’t unique in Unicode either – see U+2212 MINUS SIGN.

Show thread

ajross Mar 30

But... it's not more appropriate than an em dash for representing command line arguments? I don't see how either is any more incorrect than the other. There's a uniquely correct answer here and the em-dash is not it. Period.

Show thread

minitech Mar 30

It’s about the top-level comment’s horror that ”--” was substituted with “an en dash, not even an em dash”. If you’re picking a substitution for “--”, en dash makes more sense. The comment you originally replied to had already agreed “that it should be left as a double hyphen”.

Show thread

ajross Mar 30

> If you’re picking a substitution for “--”, en dash makes more sense.

No, it doesn't? This seems like crazy talk to me, like "If you're picking a substitute for saffron, blood plasma makes more sense than monocrystalline silicon". Like, what?

It makes zero sense to substitute this at all. It's exactly what it says it is, the "--hard" command line option to "git reset", and you write it in exactly one way.

Show thread

minitech Mar 30

Nobody is confused or disagrees about the `--hard` part. It was a minor tangent about contexts where these ASCII substitutions are established, like LaTeX (`` -> “, '' -> ”, -- -> –, --- -> —, etc.)

Show thread

dragonwriter Mar 30

> The undeniably correct way to render it is with two minus signs[1] and absolutely not something non-ascii.

> [1] Not strictly a hyphen, which has its own unicode point (0x2010) outside of ascii. Unicode embraced the ambiguity by calling this point (0x2d) "HYPHEN-MINUS" formally, but really its only unique typographic usage is to represent subtraction.

Strictly, its as you note, the hyphen-minus, and Unicode has separate, disambiguated code points for both hyphen (0x2010) and minus (0x2212); hyphen-minus has no "unique typographic usage".

Show thread

0xbadcafebee Mar 30

Article: "Major issue with most popular AI coding tool"

comments: "ThE tItLe iS aI cOded !!!1"

Show thread

butterlesstoast Mar 30

The best community

Show thread

minitech Mar 30

No, the comment was pointing out that the HN platform automatically replaces `--` in titles with `–`. (I don’t know if that’s true, but that was the intent. Nothing to do with AI.)

Show thread

tom_Mar 30

Pro tip: pros don't copy and paste from HN titles straight into the command line.

(Or... do they?? Hmm, ok, maybe I need to let this roll around in my mind.)

Show thread

dragonwriter Mar 30

That's LaTeX convention, double hyphen is an en-dash, triple hyphen is an em-dash.

Show thread

lambda Mar 30

Who would have guessed that running a binary blob dev tool, that is tied to a SaaS product, which was mostly vibe-coded, could lead to mysterious, hard to debug problems?

Show thread

Jarred Mar 30

I spent some time investigating this, and the issue is not accurate - Claude Code itself does not have code that spawns `git reset --hard origin/main`

Most likely, the developer ran `/loop 10m <prompt>` or asked claude to create a cron task that runs every 10 minutes and refreshes & resets git.