When they update the diceroller, diceroll-reliant processes go sideways.

https://github.com/anthropics/claude-code/issues/42796

[MODEL] Claude Code is unusable for complex engineering tasks with the Feb updates · Issue #42796 · anthropics/claude-code

Preflight Checklist I have searched existing issues for similar behavior reports This report does NOT contain sensitive information (API keys, passwords, etc.) Type of Behavior Issue Other unexpect...

GitHub
@mttaggart "Look, 12-sided dice were on sale this week! Don't blame us if we the code we vomited up for you wasn't designed to handle 7-12!"

@mttaggart

I don't know how you'd even begin to apply regression testing to probabilistic GenAI systems, or maintain codebases whose lifespans are intended to far outlast the next major GenAI model update. Seems like an underappreciated risk.

@DaveMWilburn I think the trick is to minimize the nondeterminism, so that the mission-critical parts are traditional automation and the LLM is handling connective tissue. Still not safe, but it's imo a necessary mitigation for pipelines like this.
@DaveMWilburn That said, whenever I ask "Hey what happens in the event of (model collapse | service outage | service shutting down)?" everyone starts slowly backing away from me

@mttaggart @DaveMWilburn TLDR, the current hotness is deterministic hooks to validate the nondeterministic output.

JAGS put out a control plane that I need to make time to play with: https://github.com/juanandresgs/claude-ctrl

GitHub - juanandresgs/claude-ctrl: The Systems Thinker's Deterministic Claude Code Control Plane

The Systems Thinker's Deterministic Claude Code Control Plane - juanandresgs/claude-ctrl

GitHub
@mttaggart At least when the service has an outage, everyone just has a meltdown on Hacker News for a few hours.

@mttaggart Oh man, the actual bug report is even better than I expected:

What You Asked Claude to Do
Claude has regressed to the point it cannot be trusted to perform complex engineering.

I never realized zero could regress to zero and have a meaningful change.

Expected Behavior
Claude should behave like it did in January.

Did you try asking it to behave like it did in January? No?