Mastodawn

Look this isn’t at all a defense of slop code, but it has me thinking — how much does code quality matter, and why?

It’s maintenance, right? We care about readability because we know we’ll have to make changes, fix bugs, etc.

But so … imagine a codebase that’s magically bug-free and feature-complete. (I’m aware this is a strawman - that’s the point, it’s a thought experiment.) Does it matter if this codebase is well-written? I’m not sure it does! (1/5)

Show thread

Issac Kelly 1d ago

@jacob there are a lot of software engineers, and developers who have read assembly instructions in the last ten years. There are a lot more who haven’t.

Show thread

Nolan B 1d ago

@issackelly @jacob The assembly was replaced by higher level code that still needed to be maintained and understood.

I’d argue that sufficiently-specific and structured prompts will still need to be maintained, too. But with the caveat that it will be run through a fundamentally non-deterministic interpreter.

Show thread

jacobian 1d ago

@palendae @issackelly But also isn’t non-determinism sort of the point of LLMS? Like it’s not something you can just turn off, right? (This is a genuine question, not a bit - I really don’t know.)

Show thread

Nolan B

@jacob @issackelly Yes, which I think is why they’re not suitable for what we’re trying to do with them.

The output of the code it writes might not matter, but consistency of it _is_. I don’t think we can trust them to produce code that retains functionality without a ton more tooling, which you alluded to earlier.

Show thread

Issac Kelly 1d ago

@palendae @jacob Jacob I think your original point is approximately where I am with it.

* Humans own/maintain the specs, and
* have a lot of oversight/responsibility over a test suite.

I don't know that "prompt maintenance" is really where my mind is at with agents + LLMs.

If the spec represents what you want, and the tests represent the spec, then that's the API to the product, like the function body is the API to other functions, what's on the inside can change or be ambiguous.

Show thread

Nolan B 1d ago

@issackelly @jacob Specs and the tests would be the prompts here; the instructions we give to a machine to do something. Whether it’s a REPL or a set of files given to agents isn’t really relevant, IMO.

I’m not especially optimistic that the industry will apply much rigor to the testing or the specs, to everyone’s detriment.

Show thread

Issac Kelly 1d ago

@palendae @jacob I dunno. I think specs (formal specifications) and tests (acceptance, regression, property-based, and unit, as appropriate) should _not_ be prompts, but rather barriers code should pass, human written or otherwise. The more complete and accurate they are the more likely you are to get value out of LLM code long term.

LLMs can help a skilled human write them; but humans must own the responsibility for them

"Product specs" can be llm prompts or jira tickets for engineers.

Show thread

Nolan B 1d ago

@issackelly @jacob I’m curious - are the specs and enhanced tests what you’re focusing on?

Because at my current job, I’m hearing a lot of people saying this, but the emphasis (and measured outcomes) is around throwing agents at code generation and automated review without bringing the testing up at the same time.

Show thread

Issac Kelly 1d ago

@palendae @jacob For me, in the last ~9 months of engineering; "It depends" -- If it _is_ or _is a dependency of_ an API or security critical code, yes. Both for me and the code I review. If it's feature work, especially experiments, on top; then "wing it" has been useful.

At the moment I am not beholden to anybody but myself, so I feel like I'm sharing the bounds of what seems _right_ not what others are currently doing.

I do feel there is far too much exuberance compared to capabilities.