Mastodawn

maybe i'm just not good enough of a programmer to use coding agents, i guess? i definitely don't trust my ability to know whether or not some code will do what i want it to do just by looking at it

Show thread

Nelson Minar 4d ago

@aparrish I don't even look at the code the agents write, or at least not much. It works better for things that you can build good test suites for or where you care more about the output of the program than the way the program works. See also @simon's book on agentic programming.

Agentic Engineering Patterns - Simon Willison's Weblog

Simon Willison’s Weblog

Show thread

allison 4d ago

@nelson i don't trust my tests to be correct either, only that they reflect my best understanding. and i'm not sure what it could mean to care more about the output of a program than how the program works...? isn't the output of a program *determined by* how the program works? i feel like whenever i've believed there was a difference between those two things, i ended up being wrong (sometimes subtly, sometimes not)

Show thread

eclexic 4d ago

@aparrish @nelson I don't think it's enough to accept code is just a black box ratcheted by tests.

If you look at the state of Claude code... It's really bad. Like worst case devolve to bogo sort bad... like store your credentials in plain text files because it can't guarantee it won't lose your credentials mid process bad.

edit: Ratcheting by tests doesn't tell you about non-deterministic total failure in rare circumstances and it doesn't tell you about security.

Show thread

allison 4d ago

@theeclecticdyslexic @nelson yeah, every instinct i have from 20+ years in software dev says "if the output looks right, and the code passes the tests, but you don't actually understand it, and you push to prod/incorporate it into your workflow anyway, you are bound to spend 10x the time fixing it than you would have spent understanding it in the first place" but maybe others don't have that instinct?

Show thread

eclexic

@aparrish @nelson well, if the LLM knows what the tests are, and you don't read the code it writes... You simply can't know it didn't write dedicated code paths for your tests.