@aparrish @nelson I don't think it's enough to accept code is just a black box ratcheted by tests.
If you look at the state of Claude code... It's really bad. Like worst case devolve to bogo sort bad... like store your credentials in plain text files because it can't guarantee it won't lose your credentials mid process bad.
edit: Ratcheting by tests doesn't tell you about non-deterministic total failure in rare circumstances and it doesn't tell you about security.
@aparrish I use Claude Code for a lot of one-offs and non-critical projects. Ie, my little thread unroller for travel postcards. The standard of quality here is
This is not a high stakes or subtle program I'm working on! For something more complex like a Fediverse server, there's way more hidden and subtle than I'd trust to an agent. People are doing that kind of work with AI too but I don't.
@aparrish @nelson Yeah, "correctness" is something we have to approach from multiple angles.
Sometimes we look at program outputs and say, "yes, that output is right for that input".
Sometimes we read the code and say, "yes, this code is correct by construction" (e.g. we can see that control flow *cannot* pass into a sensitive region without a certain check happening).
Sometimes we can use proofs, or fuzzing, or other tools.
It feels like vibe coders are focusing on only that first type.
@aparrish @nelson A lot of programmers don't seem to understand that security is the *absence* of a feature.
Sure, features can sometimes be verified by looking at a program's behavior. But you can't use that to show that a feature is missing. The should-be-missing feature might be something like "Eve can read Alice's messages to Bob".
If vibe coders are only checking for the presence of features, then can never detect the "presence" of security.