Mastodawn

fedi there is so much full force sloperating happening in open source accounting software...

no, NO look at me fedi, I know it's boring but you need to look at the accounting software, it's bad ok, it's really fucking bad out there, and I don't think your financial situation should depend on whatever logic an LLM spat into a bunch of accounting software without review

beancount developer fantasizing about running "5-8 agents" on the beancount codebase and babysitting them instead of writing code
https://groups.google.com/g/beancount/c/cz8Xwnb7BLE/m/LSA3rTfMAgAJ

ledger accepting LLM code and talking about vibe coded ports to Rust (as an experiment, at least, but lol)
https://github.com/ledger/ledger/discussions/2474

rustledger (seems almost entirely vibe coded)
https://github.com/rustledger/rustledger

paper by the american accounting association on "Applying Large Language Models in Accounting"
https://publications.aaahq.org/jeta/article-abstract/21/2/133/12800/Applying-Large-Language-Models-in-Accounting-A

Some words about LLMs and Agents

Show thread

Simon Michael Mar 2

PS no denying there is (will always be) slop. But if you look again at rustledger, you might be impressed. Beyond the usual type checks and extensive test suites it also includes TLA+ formal verification of some functionality.

Show thread

CyberFrog Mar 2

@simonmic I did see the verification, it's interesting but my critique of AI is less about the actual code at any given snapshot in time, and more about the way it's produced, the lack of oversight developers give it, and the fact that this leads to many bugs and errors being added over time.

A lot of the reason I'd probably avoid these projects is that I'm not sure if they will create regressions tomorrow, or next week, due to some massive set of AI changes with poor review, who's to say they won't just break the TLA+ tests? I have much less confidence in the developer's ability to push stable and reliable changes, and this is a pretty big red flag when their programs are supposed to handle my financial future...

Show thread

Simon Michael Mar 2

I think if you look at #rustledger (just as an example) right now you'll see the oversight and quality is pretty high. Why shouldn't it stay that way ? Isn't this a counterexample to what you're saying ? My 2c: not all use of AI is the same; not all of it is vibe coding; and projects where AI is used in some way, don't have to be low quality.

Show thread

CyberFrog

@simonmic I think if the project has a track record of high quality changes and a good review process I'm inclined to care less about the AI additions, this is why I am much less concerned about potential LLM code in the Linux kernel, or hledger itself for example, the review process and regression testing is proven to be reliable in those projects

as another counterexample here, why should I expect the quality of rustledger (or any brand new project) to be high without a good track record of reviews and analysis? The quality, after a quick check by me, seems fine so far but it's new and I'm not that inclined to trust they'll be doing good work a year from now unless I see it happening, which is a similar reason for why I tend to stick to stable established tools in general (regardless of LLMs)

It's similar for beancount, despite them doing very good work for many years, I am way more concerned with new changes now that the project lead is publicly posting about running LLM agents over the code with very little oversight, it does not inspire confidence in a project that previously seemed quite reliable to me

Show thread

Simon Michael Mar 2

@froge I agree with or can understand what you said there. Trust and confidence is a situation-specific judgement call - by all means be cautious. (Regarding rustledger: (a) that wasn't a counterexample :) and (b) for me it inspired a certain level of confidence quite quickly, but of course YMMV, that's reasonable.)