Actually depends IMO.
Writing production code first, having AI write unit tests afterwards? Terrible idea.
Using AI to generate missing unit tests in a years old legacy system? Better idea.
Doing AI coding in TDD style? Great idea. Turns out that writing the test first really improves AIs ability to come up with good production code.
TDD fans won't be surprised.
Very good and important questions.
We're talking about a technology that's growing exponentially: whatever the answer is today, it will be different next year.
All we can do today is experiment with all the new tools they release, in order to understand how they work and learn what they can do for us.
From my observation, as of today, AI can write relatively simple source units errorless in one shot and extend existing source units with smaller variations. It can often fix issues like failing tests or compiler errors on its own.
Hence, in a typical dialogue, the human can focus on code review, evolving the architecture and keeping the direction towards the requirements. Tests are crucial, as they allow AI to automatically detect and fix errors, which saves lots of time and nerves on the human side, and reduces the effects of review oversight.
Refactoring is possible, but I'm awaiting the integration of well-known automated refactorings as AI assistant tools, which would increase the efficiency and effectiveness.
Currently, I would say that the gain of AI augmented coding lies in speed (at the expense of quality) and, to some limited extend, automation. But as I said, given the current exponential growth, this will improve and it's possible that there will be unforseen use cases.
Well, other people made different experiences.
I, with my admittedly tiny toy project https://github.com/fxnn/news I'm currently working on, using Aider and Gemini 2.5 Pro. I'm curious to explore (soon) how AI does with larger legacy code.
Then there's Aider, the tool which I'm using for AI augmented coding, I believe it usually has 80-90% of a release's code written by AI: https://github.com/Aider-AI/aider
Then recently one of Cloudflare's repos was discussed, whoch they developed using Claude: https://github.com/cloudflare/workers-oauth-provider
By far no dumb question. Choosing the right step size is utterly important. With older / smaller models, you'll need to take pretty small steps to get useful results, like the "add this param" example you quoted. By far too small to be useful, if you ask me.
Then there's the people who came up with a really involved workflow, composed of multiple tools and prompts. I believe this one can be really powerful, but it just cries for automation and I don't want to feel like a trained monkey. But definitely have a look, this one is really interesting in terms of the prompts used: https://harper.blog/2025/02/16/my-llm-codegen-workflow-atm/
I'm mostly going for simpler prompts, but that's only possible with advanced models like Sonnet 3.7, o4-mini or (the one I'm using) Gemini 2.5 Pro. Real-world example from my history:
```
Add a separate mode to the application, in which it does not print the downloaded e-mails to stdout, but instead starts an HTTP server with a RESTful API.
We will add the actual API implementation later, some dummy plain-text response is enough for now.
```
This added the necessary commandline switches and the HTTP server boilerplate.
The next on was this:
```
Now we want `startHttpServer` to receive the same input as `processEmails`.
It shall also fetch e-mails, and then provide the aggregated stories from all e-mails under a `/stories` REST endpoint. As usual, start with proper unit tests.
```
And there I had my REST API.
Typically, I switch between such requirements-oriented prompts, and then more lower-level improvements/refactorings, making sure that the code quality and architecture is right, like this one:
```
In `main` func, handle `fetchAndSummarizeEmails` only once, for both `printEmails` and `startHttpServer`, because both need it in the same way. Also, handle errors directly, no need to pass them on to e.g. `startHttpServer`.
```
@fxnn @alanpaxton @akahn there's no reason to assume that reasoning lies on a continuum from whatever it is LLMs do today, such that quantitative improvement will mean they will eventually be able to reason
my recent attempts to drive an LLM using TDD have gone extremely badly, including it fabricating its own tests and showing no ability to comprehend the problem or my instructions
Yes, the steps we make shouldn't be too coarse, and also AI augmented coding is still somewhat adventurous. One of the important things it needs is a good "system prompt", some rules which tell it e.g. to never fix a test by deleting it.
I can recommend @kentbeck's recent blog posts on that matter, paywalled, e.g. https://open.substack.com/pub/tidyfirst/p/persistent-prompting
I can relate. While I get that people want to earn money with their blogs, I believe that there must be a better way than to feed Substack or Medium's pockets.
Anyways, the one good free blog on the general LLM topic (I'm reading regularly) is that from @simon, https://simonwillison.net.
Also I got recommended a blog post from https://harper.blog/posts/ which seems to feature some interesting AI coding posts lately.
Nothing more specific at the moment, unfortunately.
What makes you think that it's zero substance?
I see experienced developers every day applying tools like Claude Code to do the same task the developer would have done, in a fraction of the time, and often even in better quality.
I wonder what's "all hype" in such a technology.
@fxnn not engaging in the wider discussion here, but I'd seriously question "exponential growth" in this field. It had a single large step change, and has been very much incremental since. The improvements are linear at best, while the resources required to produce them are the only thing growing exponentially.
Like, I don't think this is even a particularly sceptical view - just facts on the ground. You can maybe discuss whether linear improvements have step change impacts, but the tech itself requires exponentially increasing resources to manifest linear improvements. That's why they've started switching to working on specialised models, reasoning, meta control systems, etc.
If you're relying on "we'll see continued improvements akin to Chat GPT's version 3 -> 5 trajectory", I think you'll be sorely disappointed.
@alanpaxton