LLMs turn your job into mostly code review, a task everyone famously loves to do and is good at
@jcoglan meanwhile we are exploring LLM-based code reviews… 🤦‍♂️
@akahn @jcoglan don’t mention the unit tests
@alanpaxton @akahn I genuinely think having an AI write tests is a category error so bad I'd almost call it malpractice

@jcoglan

Actually depends IMO.

Writing production code first, having AI write unit tests afterwards? Terrible idea.

Using AI to generate missing unit tests in a years old legacy system? Better idea.

Doing AI coding in TDD style? Great idea. Turns out that writing the test first really improves AIs ability to come up with good production code.

TDD fans won't be surprised.

@alanpaxton @akahn

@fxnn @jcoglan @akahn Generating missing tests has strict requirements on what we want the test to do (prompting) and checking that the resulting test clearly exercises what it is meant to exercise. What value is AI giving that DIY coding isn’t ? What does AI code conforming to a set of TDD tests do when we add a feature ? Can it refactor, Fowler-style, or will it generate a new version of code that satisfies the n+1 tests ? Could we support/maintain a system like that ?

@alanpaxton

Very good and important questions.

We're talking about a technology that's growing exponentially: whatever the answer is today, it will be different next year.

All we can do today is experiment with all the new tools they release, in order to understand how they work and learn what they can do for us.

@jcoglan @akahn

@fxnn @jcoglan @akahn I take a sceptical view about whether AI is truly capable of these tasks, so I will wait to be shown. Driven by doubt that the little I understand of how LLMs work is truly capable of capturing higher order reasoning, and reflective/introspective thought. But perhaps these turn out to be unnecessary for building good software.

@alanpaxton

From my observation, as of today, AI can write relatively simple source units errorless in one shot and extend existing source units with smaller variations. It can often fix issues like failing tests or compiler errors on its own.

Hence, in a typical dialogue, the human can focus on code review, evolving the architecture and keeping the direction towards the requirements. Tests are crucial, as they allow AI to automatically detect and fix errors, which saves lots of time and nerves on the human side, and reduces the effects of review oversight.

Refactoring is possible, but I'm awaiting the integration of well-known automated refactorings as AI assistant tools, which would increase the efficiency and effectiveness.

Currently, I would say that the gain of AI augmented coding lies in speed (at the expense of quality) and, to some limited extend, automation. But as I said, given the current exponential growth, this will improve and it's possible that there will be unforseen use cases.

@jcoglan @akahn

@fxnn @alanpaxton @akahn this has not been my experience. I'm not saying it's not possible, but I recently tried out using Cline to get some code written -- where I was already extremely familiar with the problem -- using TDD. it was not able to diagnose why it could not get tests to pass without me basically telling it what to do, it made up its own tests, it produced a lot of really bad code that I would honestly rather write over from scratch than even try to review
@fxnn @alanpaxton @akahn this was on a problem I was using for evaluation -- I had already thought about it at length and implemented my own solution, so the AI had a huge advantage in that the knowledge I could provide was much better than if I was approaching the problem fresh. which is to say: I already did all the hard work and the AI was incapable of helping, it just wasted a lot of my time and money
@fxnn @alanpaxton @akahn from that experience it's not remotely clear to me that any amount of the technology getting better will produce something that can meaningfully help me with the actual programming problems I have

@jcoglan

Well, other people made different experiences.

I, with my admittedly tiny toy project https://github.com/fxnn/news I'm currently working on, using Aider and Gemini 2.5 Pro. I'm curious to explore (soon) how AI does with larger legacy code.

Then there's Aider, the tool which I'm using for AI augmented coding, I believe it usually has 80-90% of a release's code written by AI: https://github.com/Aider-AI/aider

Then recently one of Cloudflare's repos was discussed, whoch they developed using Claude: https://github.com/cloudflare/workers-oauth-provider

@alanpaxton @akahn

GitHub - fxnn/news: One simple feed from your newsletter mess.

One simple feed from your newsletter mess. Contribute to fxnn/news development by creating an account on GitHub.

GitHub
@fxnn @alanpaxton @akahn sorry if this is a dumb question, just asking b/c I see very different types of examples in the docs for these things -- do you typically give the AI very specific instructions about what to change, or give it something more open ended like "make this test pass" or "figure out why this bug is happening" or "optimise this code path"
@fxnn @alanpaxton @akahn c.f. the first example in aider's docs being "add this specific param name to this specific function" which is something automated refactoring has been able to do for decades

@jcoglan

By far no dumb question. Choosing the right step size is utterly important. With older / smaller models, you'll need to take pretty small steps to get useful results, like the "add this param" example you quoted. By far too small to be useful, if you ask me.

Then there's the people who came up with a really involved workflow, composed of multiple tools and prompts. I believe this one can be really powerful, but it just cries for automation and I don't want to feel like a trained monkey. But definitely have a look, this one is really interesting in terms of the prompts used: https://harper.blog/2025/02/16/my-llm-codegen-workflow-atm/

I'm mostly going for simpler prompts, but that's only possible with advanced models like Sonnet 3.7, o4-mini or (the one I'm using) Gemini 2.5 Pro. Real-world example from my history:

```
Add a separate mode to the application, in which it does not print the downloaded e-mails to stdout, but instead starts an HTTP server with a RESTful API.
We will add the actual API implementation later, some dummy plain-text response is enough for now.
```

This added the necessary commandline switches and the HTTP server boilerplate.

The next on was this:

```
Now we want `startHttpServer` to receive the same input as `processEmails`.
It shall also fetch e-mails, and then provide the aggregated stories from all e-mails under a `/stories` REST endpoint. As usual, start with proper unit tests.
```

And there I had my REST API.

Typically, I switch between such requirements-oriented prompts, and then more lower-level improvements/refactorings, making sure that the code quality and architecture is right, like this one:

```
In `main` func, handle `fetchAndSummarizeEmails` only once, for both `printEmails` and `startHttpServer`, because both need it in the same way. Also, handle errors directly, no need to pass them on to e.g. `startHttpServer`.
```

@alanpaxton

My LLM codegen workflow atm

A detailed walkthrough of my current workflow for using LLms to build software, from brainstorming through planning and execution.

@fxnn @alanpaxton @akahn there's no reason to assume that reasoning lies on a continuum from whatever it is LLMs do today, such that quantitative improvement will mean they will eventually be able to reason

my recent attempts to drive an LLM using TDD have gone extremely badly, including it fabricating its own tests and showing no ability to comprehend the problem or my instructions

@jcoglan

Yes, the steps we make shouldn't be too coarse, and also AI augmented coding is still somewhat adventurous. One of the important things it needs is a good "system prompt", some rules which tell it e.g. to never fix a test by deleting it.

I can recommend @kentbeck's recent blog posts on that matter, paywalled, e.g. https://open.substack.com/pub/tidyfirst/p/persistent-prompting

@alanpaxton @akahn

Persistent Prompting

Getting the genie to do good things to my code is often swamped by getting it not to do bad things.

Software Design: Tidy First?
@fxnn @kentbeck @alanpaxton @akahn I'm afraid I will not be giving substack my money, but if there are other sources that are worth reading on this subject I would value recommendations

@jcoglan

I can relate. While I get that people want to earn money with their blogs, I believe that there must be a better way than to feed Substack or Medium's pockets.

Anyways, the one good free blog on the general LLM topic (I'm reading regularly) is that from @simon, https://simonwillison.net.

Also I got recommended a blog post from https://harper.blog/posts/ which seems to feature some interesting AI coding posts lately.

Nothing more specific at the moment, unfortunately.

@alanpaxton

Simon Willison’s Weblog

Simon Willison’s Weblog
@fxnn @alanpaxton @jcoglan @akahn "all we can do today" or we can ignore and not waste time on the failed technology that is all hype and zero substance.

@paulshryock

What makes you think that it's zero substance?

I see experienced developers every day applying tools like Claude Code to do the same task the developer would have done, in a fraction of the time, and often even in better quality.

I wonder what's "all hype" in such a technology.

@fxnn not engaging in the wider discussion here, but I'd seriously question "exponential growth" in this field. It had a single large step change, and has been very much incremental since. The improvements are linear at best, while the resources required to produce them are the only thing growing exponentially.

Like, I don't think this is even a particularly sceptical view - just facts on the ground. You can maybe discuss whether linear improvements have step change impacts, but the tech itself requires exponentially increasing resources to manifest linear improvements. That's why they've started switching to working on specialised models, reasoning, meta control systems, etc.

If you're relying on "we'll see continued improvements akin to Chat GPT's version 3 -> 5 trajectory", I think you'll be sorely disappointed.
@alanpaxton