Mastodawn

this is an interesting article about LLM generated code (an sqlite rewrite in rust) and the difference between "it works" and "it's good". also interesting database stuff :)

https://blog.katanaquant.com/p/your-llm-doesnt-write-correct-code

Your LLM Doesn't Write Correct Code. It Writes Plausible Code.

One of the simplest tests you can run on a database:

Vagabond Research

Show thread

Martin Seeger Mar 7

@sushee Database code is probably one of the areas where you need the most expertise. You may need to understand minuscule differences between multiple OS in order to reach adequate performance. Don’t get me started on timezones or string comparisons.

We had several threads on my discord how seemingly trivial tasks become herculean efforts in a database context.

Show thread

Stephen Bannasch (316 ppm)Mar 7

@sushee

This seems to be the rewrite sql in rust with LLM project the author is critiquing. The closed issue linked is asking for comments on the performance analysis ;-)

https://github.com/Dicklesworthstone/frankensqlite/issues/18#issue-4037436475

critical post about speed of frankensqlite · Issue #18 · Dicklesworthstone/frankensqlite

I would appreshiate a statement from your clankers: it is a fairly lenghty x article that I added a url infront so you get only the markdown for ease of use. https://defuddle.md/https://x.com/Katan...

GitHub

Show thread

Stephen Bannasch (316 ppm)Mar 7

@sushee

Just for reference: the author’s post on Twitter: https://x.com/KatanaLarp/status/2029928471632224486

Hōrōshi バガボンド (@KatanaLarp) on X

Your LLM Doesn't Write Correct Code. It Writes Plausible Code.

X (formerly Twitter)

Show thread

hajovonta Mar 7

@sushee This sentence is probably the most important in the article: "My conclusion is that LLMs work best when the user defines their acceptance criteria before the first line of code is generated."

I put together an IRC server for CL with TDD methodology. For a project this size, I found this is what works best.

https://git.sr.ht/~hajovonta/cl-irc-server

Show thread

Joseph Mar 8

@hajovonta
@sushee

I haven't tried a strict TDD approach yet, but that's been an idea I've had since I've started to think about the best way to use it as a tool beyond just vibecoding.

Cool to see an actual attempt at it

Show thread

hajovonta Mar 8

@eccles
I have now a handful of projects at the same space made with TDD: cl-jsonpath, fast-csv, and a private one which is even bigger. This is the only one method that works. Tests define the api even before the implementation, provide continuous feedback and a roadmap. The author must be vigilant when designing the tests and during implementation because the LLM tends to do shortcuts sometimes, but otherwise it mostly smoothly guides development.

@sushee

Show thread

Benny Mar 8

@hajovonta @sushee Requirements and Acceptance Criteria should always be defined before writing a single line of code, so this conclusion is totally worthless.

Show thread

hajovonta Mar 8

@SignorMacchina
I would say requirements and acceptance criteria that is not formalized is totally worthless.

@sushee

Show thread

arclight Mar 8

@hajovonta @sushee What's been really infuriating about the uptake of agentic coding is the (re)discovery of software engineering principles like proper design documentation, specification, and acceptance testing. We've known the importance of all these things since the 60s and 70s but typically don't spend time on them because coding is (was?) more enjoyable, writing about the code was perceived as less valuable than implementing the code, and having a formal structured process was suffocating, Legacy, and not 'agile' enough.

Nobody would write out this critical information for human use but devs are suddenly overjoyed to write it all down now that they have expensive obsequious incompetent plagiarizing coding robots.

It's like every episode of The Simpsons with Homer being an idiot and doing the right thing for all the wrong reasons.

There's so much anti-humanity bundled up in the commercial LLM space, it's infuriating and depressing.

Show thread

hajovonta Mar 8

@arclight
Yeah, actually writing the code is the boring stuff, mostly mechanical and error-prone. I did it for decades. Having a tool doing this part is great, because we can focus on what is more enjoyable: coming up with ideas, planning, setting up scope, providing oversight, controlling the process from a higher level, verifying results.

It is not everyone's cup of cake, I get that. It's not that it takes away the possibility of writing code by hand.

@sushee

Show thread

David Zaslavsky Mar 8

@hajovonta @sushee I agree that's probably the closest thing to a conclusion/main takeaway of the article, although it also became obvious to me about half an hour into my first time using an LLM to work on code, so I don't consider it particularly groundbreaking.

Show thread

Mindiell Mar 8

@sushee Main problem with IA and LLM is not "is it working ?" but "Is it destroying our planet ?".

Show thread

Prasun Mar 8

@sushee This is the response from the author https://x.com/doodlestein/status/2030135382411550856

Jeffrey Emanuel (@doodlestein) on X

Dedicate your time and money to open source. One of the nice benefits? Jealous, bad-faith losers can try to benchmark your unfinished code (that you never once claimed is done or ready for review) and then try to claim you’re a charlatan. This guy should be shunned and ignored.

X (formerly Twitter)

Show thread

morgan Mar 8

@sushee still far too optimistic

> An experienced database engineer using an LLM to scaffold a B-tree would have caught the is_ipk bug in code review because they know what a query plan should emit

good luck if all you've been doing for a couple years is deskilling yourself.

And even if that were true, the horrible externalities of llms are still there

Show thread

sabik Mar 8

@sushee
Odd conclusion, though; the claim in the conclusion that LLMs are sometimes useful doesn't seem to follow from anything in the body of the article 🤷‍♀️

Show thread

CarePackage17 Mar 8

@sabik @sushee yeah, I don't get it either, particularly the SQLite example.

"You can let the slop machine generate a half-broken version that looks like the real thing but it's actually slow and incorrect"

On the one hand, battle-tested, load-bearing software that shipped on a shitload of platforms (it's basically in every smartphone and PC operating system these days) and has been there for decades.
On the other hand...you can cosplay as a database dev?

What are we doing here even?

Show thread

David Zaslavsky Mar 8

@sabik @sushee I got the sense that wasn't meant to be the conclusion of the article though. The way I understood it, it's basically just a rant about agentic coding being used in a situation where it has almost no chance of doing well, and the author threw in a caveat at the end of "I'm not trying to say that LLMs are useless, just that this particular application is dumb."