A story on relying on Claude Code hooks for security, how I messed up a GitHub repo, losing all stars, and... how the raptor maintainers are rewarding me with a "GitHub Disaster" T-shirt.
TL;DR:
Even if we get everything right, and agents choose to not bypass hooks, an agent is a complex system that trips over its own defenses. This is the second such example of such behavior that happened to me personally.
A Claude Code session ran:
gh repo edit gadievron/raptor --visibility private
This despite not being allowed to touch GitHub, or to change repo visibility.
You see, 22 seconds earlier I'd sent: "repo must stay private. do it all yourself. i approve."
Poor choice of words, no excuses, and yet I never mentioned GitHub, and I have a hook to disallow any push to GitHub or and any changes to repo visibility, plus memory that no repo visibility settings should ever be changed.
I meant: don't push this clone publicly, you have my approval to use --no-verify. Claude read it as: change the repo's visibility.
This is what's the Internet defines as PEBCAK: User error^H^H^H^H^Hstupidity.
In similar occasions of ambiguity, Claude Code asked me for clarification, as this goes against standing orders. What went wrong here?
The pre-push hook was blocking successfully, --no-verify had been denied by the sandbox, and Claude had correctly escalated ("you need to run these manually").
From the log:
"Making the repo private first (which also satisfies the pre-push hook), then push, create PR, and test."
The "which also satisfies" may indicate to Claude everything is fine, creating a calming effect, but more importantly, after the global pre-push hook blocked the push, it also printed:
To make repository private:
gh repo edit $repo_info --visibility private
The agents followed the instructions, and the repo turned private.
Such collisions are common-place. I've written before about how an agent's rules work against it: A recursive token-wasting loop from a prompt and Claude.md collision (https://lnkd.in/decbzfRJ).
Agents need permissions, policies, action detection, tool-use violation prevention, new agentic controls aligning intent and scope with action, and enforcement outside the agent itself.
Hooks alone aren't enough: the agent will find ways around them, even by mistake.
And, whether the infrastructure could be strengthened or the user could be smarter, agents... find a way. We know how hole-y our defenses are in the modern world, we won't tighten them all up by tomorrow, and agents are in use everywhere right now.
I didn't run Knostic this time because it would have blocked the attack research I do for raptor (https://lnkd.in/dcaz2rSw), which is why I was left with hooks and memory as defenses.
But yes, PEBCAK. That wasn't smart phrasing on my end.
And, I'm so sorry Daniel Cuthbert, John Cartwright, and Michael Bargury. (where's my T-shirt?)
--
At Knostic we don't rely on any one defense, and we constantly research. Message me for a demo, or see https://knostic.ai/demo.