Mastodawn

The hidden beauty of vibe coding

"It passed all the unit tests, the shape of the code looks right," he said. It's 3.7x more lines of code that performs 2,000 times worse than the actual SQLite. Two thousand times worse for a database is a non-viable product. It's a dumpster fire. Throw it away. All that money you spent on it is worthless."

https://www.theregister.com/2026/03/17/ai_businesses_faking_it_reckoning_coming_codestrap/

AI still doesn't work very well, businesses are faking it, and a reckoning is coming

interview: Codestrap founders say we need to dial down the hype and sort through the mess

The Register

Show thread

Gaëtan Perrault

@gerrymcgovern This is an unusual article. It mixes truth and misconceptions in awkward ways.

For example:

Smiley pointed to a recent attempt to rewrite SQLite in Rust using AI

This isn't what happened. It was a C Compiler that was rewritten. A different tester then rebuilt SQLite using both the AI and the official one. The AI one did worse.

But it did worse for very specific reasons. The AI version was only tested for correctness. It was only given unit tests as a parameter for success. It failed on real world performance tests, because it was never actually given that as a requirement.

Lines of code, number of [pull requests], these are liabilities. These are not measures of engineering excellence."... Measures of engineering excellence, said Smiley, include metrics like deployment frequency, lead time to production, change failure rate, mean time to restore, and incident severity.

So these are famously known as the DORA metrics. And they don't measure engineering excellence, ... /1

Show thread

Gaëtan Perrault 4d ago

@gerrymcgovern ... they measure the capabilities of the engineering platform along with the expertise of the people using that platform.

There are lots of companies with excellent engineers and crummy DORA scores because they don't have the institutional support to improve those metrics. Nor does the score mean the business is successful. You can have great DORA metrics and still lack for paying customers.

"The other challenge here is that the incentives are misaligned,"

But then he proceeds to list a bunch of examples for competing incentives. His examples of "misaligned" are really examples of "I would like to deliver less and get paid more"... /2

Show thread

Gaëtan Perrault 4d ago

@gerrymcgovern ...

If there's an incentives problem here, it's that companies have been paying for a lot of BS rituals and they're discovering that the BS generating machine is undermining part of the ritual. Companies have also been getting away with under-specifying success in order to pad results as "good". But Gen AIs will "fill in" the under-specificity with made up data. Or they will fail to deliver anything into the gap that some human was hoping would be filled.

But none of this is "misaligned". It's intentional ambiguity designed to protect business units. The AI is just exposing the BS for what it is.

OP is kind of talking about that BS problem. But he's taking weird micro angles to view subsets of the problem without calling out the greater problem. He's not wrong, but he's also not really right either. 🤷🏻‍♂️ //

Show thread

richh 3d ago

@gatesvp @gerrymcgovern

"This isn't what happened. It was a C Compiler that was rewritten. A different tester then rebuilt SQLite using both the AI and the official one. The AI one did worse."

There seems to be some confusion on your part. I suspect you're thinking of Claude's C Compiler, which made a hash of building SQLite (although it's impressive it managed that at all).

If you follow the links from this article, they're referring to an analysis of FrankenSQLite (https://frankensqlite.com/), which is billed as a "clean room reimplementation", from the same guy who vibe coded an 80k LOC disk cleanup daemon to replace a one-line cron job. 🙄

And it runs. It's just awful.

https://blog.katanaquant.com/p/your-llm-doesnt-write-correct-code

FrankenSQLite — The Monster Database Engine for Rust

A clean-room Rust reimplementation of SQLite with MVCC concurrency, RaptorQ self-healing, and zero unsafe code. 26-crate workspace delivering the monster database engine.

FrankenSQLite

Show thread

Gaëtan Perrault 3d ago

@richh @gerrymcgovern

The article quote is this:

It's 3.7x more lines of code that performs 2,000 times worse

Which is very close to this experiment here. They get over 3x the compiled size & slowdown numbers as bad as 150k times worse.

I had no idea about FrankenSqlite.

Based on the tenor of the original article though, it sounds like he was talking about the CCC experiment because that was a comparable attempt. This Franken thing actually tries to improve upon SQLite.

OP's article Links to a medium article which doesn't link to either the CCC thing or the Franken thing. The Medium article seems to reference the Franken thing, but doesn't link it directly either. OP's article seems like it could be either one 🤷🏻‍♂️

Thanks for the extra data, I think it's notable that the arguments I make stay the same either way. 😃

Even for FrankenSQLite, It's clear that they had a limited scope of performance tests.

GitHub - harshavmb/compare-claude-compiler: Comparison of GCC vs CCC

Comparison of GCC vs CCC. Contribute to harshavmb/compare-claude-compiler development by creating an account on GitHub.

GitHub

Show thread

richh 3d ago

@gatesvp @gerrymcgovern
"OP's article Links to a medium article which doesn't link to either the CCC thing or the Franken thing."

Please follow the links. Like, really read the articles in detail.

OP's article from The Register (https://www.theregister.com/2026/03/17/ai_businesses_faking_it_reckoning_coming_codestrap/)

links to >

Medium (https://medium.com/write-a-catalyst/an-ai-wrote-576-000-lines-to-replace-sqlite-7ea538826d72)

links to >

Katanaquant (https://blog.katanaquant.com/p/your-llm-doesnt-write-correct-code)

links to >

FrankenSQLite.

It's all there.

I don't discount that Smiley might have mixed up their stats in the Reg interview with the 3.7x LOC quote, since that is quite similar to the CCC figures.

Both are relevant to the articles tenor though - AI can output code that will run, but lacks domain knowledge and will do exactly what you ask it to, even if it's ridiculous ("Hey, write me a cleanup tool", instead of "what's the best way to do this cleanup - 1 line cron job").

AI still doesn't work very well, businesses are faking it, and a reckoning is coming

interview: Codestrap founders say we need to dial down the hype and sort through the mess

The Register

Show thread

Gaëtan Perrault 3d ago

@richh @gerrymcgovern

It's all there.

3 links deep.

Honestly, there's a whole separate discussion to be had about how lazy the OP article actually is.

They quoted descriptions of DORA metrics without actually linking to those.
They generated this whole line of confusion by referencing an article that referenced an article that referenced a specific technology... That they could have just linked to directly
They have a whole section on consultants that's just quotes from one person with absolutely zero external links.

It's quite possible that you and I have not spent more time analyzing the article than the author spent and actually writing it. 😃

Which probably means we need to start writing our own and better articles. 😃

Show thread

richh 3d ago

@gatesvp @gerrymcgovern TBH we’re lucky to get links at all. So many news articles will open “a report published today has found that…” but will the article itself link to the report? Not a chance - they don’t want you understanding the nuance or reading past their summarisation. So props to them here for literally providing some sources, even if they (or the interviewee) is muddling up different projects. We’re all learning something new from it!