Hi, just because a software tells you it’s “reasoning” or “thinking” doesn’t mean it actually is.
You won’t believe this but programmers can make software lie about what it does.
Maybe programmers need more guardrails.
Hi, just because a software tells you it’s “reasoning” or “thinking” doesn’t mean it actually is.
You won’t believe this but programmers can make software lie about what it does.
Maybe programmers need more guardrails.
RE: https://chaos.social/@jacqueline/116685107588484911
https://github.com/anthropics/knowledge-work-plugins/pull/193
reviewed! merged! what are we doing here
I think the modal situation here is that the people are reading none or very little of what is being generated by the LLM, so the tests have a special role: Tests function as the pull arm on the slot machine, you just generate until tests pass, and that's a jackpot. Obviously that's meaningless when the tests are meaningless, so tests take on a very different meaning and role in slot machine coding.
Previously we would write careful test conditions that were based off some real problem or an understanding of what the code under test did, and had a specific thing they were intended to protect against. Tests move slow and are designed to protect us against the things we know can go wrong. When we learn of a new wrong thing, we add a test.
LLM tests have the form of tests but don't do the same thing. They often test nothing, and are just expressions of truisms that the probabilistic text space explored while generating. They have strongly worded names but end up actually asserting that basic language features work as expected. Because it is not us writing tests for ourselves, where we only harm ourselves by making them weak, they function instead as a passively obfuscated justification for the code that the LLM generates. The user wants the tests to pass. The LLM provides.
The tests are theater: they are the play field for the slot machine. They are mild, surmountable, need to fail a few times to be plausible, but must eventually pass within the expected generation loop window to deliver the payout.
RE: https://hails.org/@hailey/116657391001259044
all the criticism has been said, all the takes been had. the only metaphor i have been finding consistently useful for understanding what is happening with people and "AI" is addiction, and specifically gambling addiction.
Real talk: the real "supply chain risk" is that you treat your open source "supply chain" like shit and assume that we will all take any amount of abuse from you and just keep doing volunteer labor forever without ever complaining. And, equally real talk: most of us—myself included—actually do love the process and the community so much that you're right, and there will never be any real consequence.
But not all of us.