LLMs turn your job into mostly code review, a task everyone famously loves to do and is good at
check out my soundcloud etc https://shop.jcoglan.com/building-git/
Building Git โ€“ shop.jcoglan.com

@jcoglan real devs scream at the bot to fix its own damn mistakes!!
@jcoglan We asked the LLM to code in a language we don't know... so we've just been approving the merge requests ๐Ÿคก

@jcoglan I've been telling my friends this:

I do not use LLM code generation because I do not want to review code hallucinations. However, my coworkers use LLMs, and I review their code, so I can't escape it. ๐Ÿ˜ž

@jcoglan I cant be trusted to write it so surely I understand it enough to review it.
@jcoglan whenever I use LLM I try to make it spew out just a couple lines of example code I can then adapt to my project.
@jcoglan as someone who _does_ actually love to do code review and has received a tremendous amount of feedback that I'm really good at it, there is a certainโ€ฆ presumption of good faith that my code-review skills rely upon, which LLMs violate completely

@jcoglan

There will be AI's checking the code reviews soon enough.

Then all of this nonsense this will start crashing, falling out of the sky or exploding when not actually required to do so.

@jcoglan meanwhile we are exploring LLM-based code reviewsโ€ฆ ๐Ÿคฆโ€โ™‚๏ธ
@akahn @jcoglan donโ€™t mention the unit tests
@alanpaxton @akahn I genuinely think having an AI write tests is a category error so bad I'd almost call it malpractice
@jcoglan @akahn no argument from me there. Just Hamlet level depths of sadness and despair.
@alanpaxton @jcoglan @akahn I fixed the test errors by deleting the tests ๐Ÿค– *beepboopbeep*

@jcoglan

Actually depends IMO.

Writing production code first, having AI write unit tests afterwards? Terrible idea.

Using AI to generate missing unit tests in a years old legacy system? Better idea.

Doing AI coding in TDD style? Great idea. Turns out that writing the test first really improves AIs ability to come up with good production code.

TDD fans won't be surprised.

@alanpaxton @akahn

@fxnn @jcoglan @akahn Generating missing tests has strict requirements on what we want the test to do (prompting) and checking that the resulting test clearly exercises what it is meant to exercise. What value is AI giving that DIY coding isnโ€™t ? What does AI code conforming to a set of TDD tests do when we add a feature ? Can it refactor, Fowler-style, or will it generate a new version of code that satisfies the n+1 tests ? Could we support/maintain a system like that ?

@alanpaxton

Very good and important questions.

We're talking about a technology that's growing exponentially: whatever the answer is today, it will be different next year.

All we can do today is experiment with all the new tools they release, in order to understand how they work and learn what they can do for us.

@jcoglan @akahn

@fxnn @jcoglan @akahn I take a sceptical view about whether AI is truly capable of these tasks, so I will wait to be shown. Driven by doubt that the little I understand of how LLMs work is truly capable of capturing higher order reasoning, and reflective/introspective thought. But perhaps these turn out to be unnecessary for building good software.

@alanpaxton

From my observation, as of today, AI can write relatively simple source units errorless in one shot and extend existing source units with smaller variations. It can often fix issues like failing tests or compiler errors on its own.

Hence, in a typical dialogue, the human can focus on code review, evolving the architecture and keeping the direction towards the requirements. Tests are crucial, as they allow AI to automatically detect and fix errors, which saves lots of time and nerves on the human side, and reduces the effects of review oversight.

Refactoring is possible, but I'm awaiting the integration of well-known automated refactorings as AI assistant tools, which would increase the efficiency and effectiveness.

Currently, I would say that the gain of AI augmented coding lies in speed (at the expense of quality) and, to some limited extend, automation. But as I said, given the current exponential growth, this will improve and it's possible that there will be unforseen use cases.

@jcoglan @akahn

@fxnn @alanpaxton @akahn this has not been my experience. I'm not saying it's not possible, but I recently tried out using Cline to get some code written -- where I was already extremely familiar with the problem -- using TDD. it was not able to diagnose why it could not get tests to pass without me basically telling it what to do, it made up its own tests, it produced a lot of really bad code that I would honestly rather write over from scratch than even try to review
@fxnn @alanpaxton @akahn this was on a problem I was using for evaluation -- I had already thought about it at length and implemented my own solution, so the AI had a huge advantage in that the knowledge I could provide was much better than if I was approaching the problem fresh. which is to say: I already did all the hard work and the AI was incapable of helping, it just wasted a lot of my time and money
@fxnn @alanpaxton @akahn from that experience it's not remotely clear to me that any amount of the technology getting better will produce something that can meaningfully help me with the actual programming problems I have

@jcoglan

Well, other people made different experiences.

I, with my admittedly tiny toy project https://github.com/fxnn/news I'm currently working on, using Aider and Gemini 2.5 Pro. I'm curious to explore (soon) how AI does with larger legacy code.

Then there's Aider, the tool which I'm using for AI augmented coding, I believe it usually has 80-90% of a release's code written by AI: https://github.com/Aider-AI/aider

Then recently one of Cloudflare's repos was discussed, whoch they developed using Claude: https://github.com/cloudflare/workers-oauth-provider

@alanpaxton @akahn

GitHub - fxnn/news: One simple feed from your newsletter mess.

One simple feed from your newsletter mess. Contribute to fxnn/news development by creating an account on GitHub.

GitHub
@fxnn @alanpaxton @akahn sorry if this is a dumb question, just asking b/c I see very different types of examples in the docs for these things -- do you typically give the AI very specific instructions about what to change, or give it something more open ended like "make this test pass" or "figure out why this bug is happening" or "optimise this code path"
@fxnn @alanpaxton @akahn c.f. the first example in aider's docs being "add this specific param name to this specific function" which is something automated refactoring has been able to do for decades

@jcoglan

By far no dumb question. Choosing the right step size is utterly important. With older / smaller models, you'll need to take pretty small steps to get useful results, like the "add this param" example you quoted. By far too small to be useful, if you ask me.

Then there's the people who came up with a really involved workflow, composed of multiple tools and prompts. I believe this one can be really powerful, but it just cries for automation and I don't want to feel like a trained monkey. But definitely have a look, this one is really interesting in terms of the prompts used: https://harper.blog/2025/02/16/my-llm-codegen-workflow-atm/

I'm mostly going for simpler prompts, but that's only possible with advanced models like Sonnet 3.7, o4-mini or (the one I'm using) Gemini 2.5 Pro. Real-world example from my history:

```
Add a separate mode to the application, in which it does not print the downloaded e-mails to stdout, but instead starts an HTTP server with a RESTful API.
We will add the actual API implementation later, some dummy plain-text response is enough for now.
```

This added the necessary commandline switches and the HTTP server boilerplate.

The next on was this:

```
Now we want `startHttpServer` to receive the same input as `processEmails`.
It shall also fetch e-mails, and then provide the aggregated stories from all e-mails under a `/stories` REST endpoint. As usual, start with proper unit tests.
```

And there I had my REST API.

Typically, I switch between such requirements-oriented prompts, and then more lower-level improvements/refactorings, making sure that the code quality and architecture is right, like this one:

```
In `main` func, handle `fetchAndSummarizeEmails` only once, for both `printEmails` and `startHttpServer`, because both need it in the same way. Also, handle errors directly, no need to pass them on to e.g. `startHttpServer`.
```

@alanpaxton

My LLM codegen workflow atm

A detailed walkthrough of my current workflow for using LLms to build software, from brainstorming through planning and execution.

@fxnn @alanpaxton @akahn there's no reason to assume that reasoning lies on a continuum from whatever it is LLMs do today, such that quantitative improvement will mean they will eventually be able to reason

my recent attempts to drive an LLM using TDD have gone extremely badly, including it fabricating its own tests and showing no ability to comprehend the problem or my instructions

@jcoglan

Yes, the steps we make shouldn't be too coarse, and also AI augmented coding is still somewhat adventurous. One of the important things it needs is a good "system prompt", some rules which tell it e.g. to never fix a test by deleting it.

I can recommend @kentbeck's recent blog posts on that matter, paywalled, e.g. https://open.substack.com/pub/tidyfirst/p/persistent-prompting

@alanpaxton @akahn

Persistent Prompting

Getting the genie to do good things to my code is often swamped by getting it not to do bad things.

Software Design: Tidy First?
@fxnn @kentbeck @alanpaxton @akahn I'm afraid I will not be giving substack my money, but if there are other sources that are worth reading on this subject I would value recommendations

@jcoglan

I can relate. While I get that people want to earn money with their blogs, I believe that there must be a better way than to feed Substack or Medium's pockets.

Anyways, the one good free blog on the general LLM topic (I'm reading regularly) is that from @simon, https://simonwillison.net.

Also I got recommended a blog post from https://harper.blog/posts/ which seems to feature some interesting AI coding posts lately.

Nothing more specific at the moment, unfortunately.

@alanpaxton

Simon Willisonโ€™s Weblog

Simon Willisonโ€™s Weblog
@fxnn @alanpaxton @jcoglan @akahn "all we can do today" or we can ignore and not waste time on the failed technology that is all hype and zero substance.

@paulshryock

What makes you think that it's zero substance?

I see experienced developers every day applying tools like Claude Code to do the same task the developer would have done, in a fraction of the time, and often even in better quality.

I wonder what's "all hype" in such a technology.

@fxnn not engaging in the wider discussion here, but I'd seriously question "exponential growth" in this field. It had a single large step change, and has been very much incremental since. The improvements are linear at best, while the resources required to produce them are the only thing growing exponentially.

Like, I don't think this is even a particularly sceptical view - just facts on the ground. You can maybe discuss whether linear improvements have step change impacts, but the tech itself requires exponentially increasing resources to manifest linear improvements. That's why they've started switching to working on specialised models, reasoning, meta control systems, etc.

If you're relying on "we'll see continued improvements akin to Chat GPT's version 3 -> 5 trajectory", I think you'll be sorely disappointed.
@alanpaxton

@jcoglan they're using AI to do the code review too https://refactoring.fm/p/ai-code-reviews
How Code Reviews are Changing with AI ๐Ÿ”

Reflections and predictions on the future of code reviews, taking inspiration from the success of CodeRabbit.

Refactoring

@jcoglan

On of my theories when the first studies came out showing folks loved LLMs for coding but by a very high number distrusted the output was that they find doing code review more breezy and it makes them feel like they are supervising someone and superior and therefore they have the illusion of being more productive.

Kind of similar to how folks insisted multi-tasking was superior until all the studies came out showing objectively that their productivity plummeted regardless of how good it subjectively felt to them.

For a field in which we talk about data so much, we do a really bad job job of measuring productivity and factors that affect productively it shows.

@shafik @jcoglan yep, its all about that feeling of controlling "someone" else. Rooted in racism and white supremacy.

@jcoglan sorry for the literal screenshots, but check this out.

Wrote a library, then got asked to "put the problem into ChatGPT and see how close it gets to yours".

Well, it took me quite a long time to get the over-confident, defensive PoS to admit it f'ed up OCR* on a table.

(*kinda. I still don't know what exactly went wrong along the way, but since it barfs out and executes python code to get there, it might as well have screwed up some indices under the hood to mix up the table cells.)

@jcoglan ah well I think I see it now.. column index off by one

Having LLMs do tax reports and medical decisions will be SWEEET

@jcoglan @MisterHW isnt there legal ai allready one from cloude. I love where this is going. Non deterministic output to deterministic systems thats so sweet. I love how we'll start debugging it. We should have psychaietrists for ai's
@jcoglan Especially reviewing code from half-assed programmers.

@jcoglan This is what is freaking me out - I need a job and did software development a while back but can't handle the thought of reviewing AI code.

At least when you do it for a colleague there's an aspect of upskilling for all involved. For AI, only the billionaires benefit.

@jcoglan
There is no such thing as code review.
@janneke not sure I understand what you mean by this, can you elaborate?
@jcoglan
I'm also interested in learning more. In some metaphoric kind of Un-Code-Review way, as with unconferences, or as a matter of fact?
@janneke

@yala @jcoglan
When I learnt programming in the 80s, I would always do that together with a friend. Having access to a computer was hard to get and you would share that time. After a brief stint as a lonely "professional" programmer in the 90s, that was "allowed" again but you had to call it pair programming.

I've been trying to avoid the ritual commonly known as code review (trckacr) for many reasons. I don't like it as a reviewer. I'm bad at it. I'm not a computer and something that looks plausible to me may not even compile. Cosmetic or stylistic errors/anomalies are easy, but reviews usually happen when the code is finished. Are you really going to suggest a full rewrite? A good friend of mine was a team lead at Google, and they lamented about how impossible code review sessions were and how producing quality code this way seems impossible.

It's not all black and white. I do find review comments helpful sometimes, especially when entering a new project. Usually that's about style or other cultural memes.

If you want to have (give or receive) truly impactful input from a second person on a piece of code, pair program it.

Anyway, I believe that the person who codes alone as opposed to pairing is cultivating the software crisis, and so is trckacr.

@yala @jcoglan
So yeah, I do acknowledge the ritual exists, just not that it has much if anything to do with what it suggests that it is, or with what managers (non-programmers) belief it is.
@jcoglan I helped to invent and commercialize static analysis engines for precisely that reason. So fun.
@jcoglan not even that. I also use LLMs for code reviews...
@jcoglan and in open source LLMs can now automate the process of giving someone else horrible code review to do
@jcoglan Yeah! Especially when the reviewed code is just snippets copied from random sites on the internet! Fun!
@jcoglan and definitely never ever ever scrolls through half distractedly and says "LGTM"

@jcoglan

Someone I know who is really into using LLMs to code โ€” he uses one LLM to code, and another completely different LLM to code review.

He acts more like a technical lead or technical manager.