Mastodawn

Blåhaj 1d ago

Thomas Fuchs 2d ago

Hi, just because a software tells you it’s “reasoning” or “thinking” doesn’t mean it actually is.

You won’t believe this but programmers can make software lie about what it does.

Maybe programmers need more guardrails.

Blåhaj 4d ago

Show thread

jacquelines 🌟4d ago

it's industrial scale gish gallop

Blåhaj 4d ago

Show thread

jacquelines 🌟4d ago

every ai booster's argument is "this slot machine that kills the world makes me feel good, so if u make me feel bad about it u must be wrong"

Blåhaj 4d ago

jacquelines 🌟4d ago

every ai guy says "i carefully reviewed everything!" but then u actually read the code and it's dogshit

Blåhaj 4d ago

ailurux 4d ago

RE: https://chaos.social/@jacqueline/116685107588484911

https://github.com/anthropics/knowledge-work-plugins/pull/193

reviewed! merged! what are we doing here

Blåhaj May 31

Show thread

jonny (nonvenomous)May 31

I think the modal situation here is that the people are reading none or very little of what is being generated by the LLM, so the tests have a special role: Tests function as the pull arm on the slot machine, you just generate until tests pass, and that's a jackpot. Obviously that's meaningless when the tests are meaningless, so tests take on a very different meaning and role in slot machine coding.

Previously we would write careful test conditions that were based off some real problem or an understanding of what the code under test did, and had a specific thing they were intended to protect against. Tests move slow and are designed to protect us against the things we know can go wrong. When we learn of a new wrong thing, we add a test.

LLM tests have the form of tests but don't do the same thing. They often test nothing, and are just expressions of truisms that the probabilistic text space explored while generating. They have strongly worded names but end up actually asserting that basic language features work as expected. Because it is not us writing tests for ourselves, where we only harm ourselves by making them weak, they function instead as a passively obfuscated justification for the code that the LLM generates. The user wants the tests to pass. The LLM provides.

The tests are theater: they are the play field for the slot machine. They are mild, surmountable, need to fail a few times to be plausible, but must eventually pass within the expected generation loop window to deliver the payout.

Blåhaj May 31

Show thread

jonny (nonvenomous)May 29

i love gambling. i have used "AI" extensively. it feels the same.

Blåhaj May 31

jonny (nonvenomous)May 29

RE: https://hails.org/@hailey/116657391001259044

all the criticism has been said, all the takes been had. the only metaphor i have been finding consistently useful for understanding what is happening with people and "AI" is addiction, and specifically gambling addiction.

Blåhaj May 30

Show thread

Glyph May 30

Real talk: the real "supply chain risk" is that you treat your open source "supply chain" like shit and assume that we will all take any amount of abuse from you and just keep doing volunteer labor forever without ever complaining. And, equally real talk: most of us—myself included—actually do love the process and the community so much that you're right, and there will never be any real consequence.

But not all of us.

Blåhaj May 30

Show thread

Glyph May 30

Remember that point in history around 2021 where suddenly there was a rush of supply-chain attacks that were all *VERY* focused on infostealer malware that could detect metamask wallets, specifically? Now imagine that instead of a few threat actors uploading a few scam typosquats, the people who are motivated enough to target you and want to ruin your life are your entire dependency supply chain, and the Metamask that marks you as a target is your Claude Max subscription.