Mastodawn

jcoglan Mar 7

jcoglan

LLMs turn your job into mostly code review, a task everyone famously loves to do and is good at

Show thread

jcoglan Mar 8

check out my soundcloud etc https://shop.jcoglan.com/building-git/

Building Git – shop.jcoglan.com

Show thread

D2 Jun 1, 2025

@jcoglan real devs scream at the bot to fix its own damn mistakes!!

Show thread

Patrick Loftus 🖖

Jun 1, 2025

@jcoglan We asked the LLM to code in a language we don't know... so we've just been approving the merge requests 🤡

Show thread

Cody Casterline 🏳️‍🌈Jun 1, 2025

@jcoglan I've been telling my friends this:

I do not use LLM code generation because I do not want to review code hallucinations. However, my coworkers use LLMs, and I review their code, so I can't escape it. 😞

Show thread

Jon [🇺🇦,🏳️‍🌈]Jun 2, 2025

@jcoglan I cant be trusted to write it so surely I understand it enough to review it.

Show thread

gunstick Jun 2, 2025

@jcoglan whenever I use LLM I try to make it spew out just a couple lines of example code I can then adapt to my project.

Show thread

Glyph Jun 2, 2025

@jcoglan as someone who _does_ actually love to do code review and has received a tremendous amount of feedback that I'm really good at it, there is a certain… presumption of good faith that my code-review skills rely upon, which LLMs violate completely

Show thread

Simon Zerafa (Status:

😊)Jun 2, 2025

@jcoglan

There will be AI's checking the code reviews soon enough.

Then all of this nonsense this will start crashing, falling out of the sky or exploding when not actually required to do so.

Show thread

Alex Jun 2, 2025

@jcoglan meanwhile we are exploring LLM-based code reviews… 🤦‍♂️

Show thread

alanpaxton Jun 2, 2025

@akahn @jcoglan don’t mention the unit tests

Show thread

jcoglan Jun 2, 2025

@alanpaxton @akahn I genuinely think having an AI write tests is a category error so bad I'd almost call it malpractice

Show thread

alanpaxton Jun 2, 2025

@jcoglan @akahn no argument from me there. Just Hamlet level depths of sadness and despair.

Show thread

monkee

Mar 7

@alanpaxton @jcoglan @akahn I fixed the test errors by deleting the tests 🤖 *beepboopbeep*

Show thread

Felix Neumann Jun 2, 2025

@jcoglan

Actually depends IMO.

Writing production code first, having AI write unit tests afterwards? Terrible idea.

Using AI to generate missing unit tests in a years old legacy system? Better idea.

Doing AI coding in TDD style? Great idea. Turns out that writing the test first really improves AIs ability to come up with good production code.

TDD fans won't be surprised.

@alanpaxton @akahn

Show thread

alanpaxton Jun 3, 2025

@fxnn @jcoglan @akahn Generating missing tests has strict requirements on what we want the test to do (prompting) and checking that the resulting test clearly exercises what it is meant to exercise. What value is AI giving that DIY coding isn’t ? What does AI code conforming to a set of TDD tests do when we add a feature ? Can it refactor, Fowler-style, or will it generate a new version of code that satisfies the n+1 tests ? Could we support/maintain a system like that ?

Show thread

Felix Neumann Jun 3, 2025

@alanpaxton

Very good and important questions.

We're talking about a technology that's growing exponentially: whatever the answer is today, it will be different next year.

All we can do today is experiment with all the new tools they release, in order to understand how they work and learn what they can do for us.

@jcoglan @akahn

Show thread

alanpaxton Jun 3, 2025

@fxnn @jcoglan @akahn I take a sceptical view about whether AI is truly capable of these tasks, so I will wait to be shown. Driven by doubt that the little I understand of how LLMs work is truly capable of capturing higher order reasoning, and reflective/introspective thought. But perhaps these turn out to be unnecessary for building good software.

Show thread

Felix Neumann Jun 3, 2025

@alanpaxton

From my observation, as of today, AI can write relatively simple source units errorless in one shot and extend existing source units with smaller variations. It can often fix issues like failing tests or compiler errors on its own.

Hence, in a typical dialogue, the human can focus on code review, evolving the architecture and keeping the direction towards the requirements. Tests are crucial, as they allow AI to automatically detect and fix errors, which saves lots of time and nerves on the human side, and reduces the effects of review oversight.

Refactoring is possible, but I'm awaiting the integration of well-known automated refactorings as AI assistant tools, which would increase the efficiency and effectiveness.

Currently, I would say that the gain of AI augmented coding lies in speed (at the expense of quality) and, to some limited extend, automation. But as I said, given the current exponential growth, this will improve and it's possible that there will be unforseen use cases.

@jcoglan @akahn

Show thread

jcoglan Jun 3, 2025

@fxnn @alanpaxton @akahn this has not been my experience. I'm not saying it's not possible, but I recently tried out using Cline to get some code written -- where I was already extremely familiar with the problem -- using TDD. it was not able to diagnose why it could not get tests to pass without me basically telling it what to do, it made up its own tests, it produced a lot of really bad code that I would honestly rather write over from scratch than even try to review

Show thread

jcoglan Jun 3, 2025

@fxnn @alanpaxton @akahn this was on a problem I was using for evaluation -- I had already thought about it at length and implemented my own solution, so the AI had a huge advantage in that the knowledge I could provide was much better than if I was approaching the problem fresh. which is to say: I already did all the hard work and the AI was incapable of helping, it just wasted a lot of my time and money

Show thread

jcoglan Jun 3, 2025

@fxnn @alanpaxton @akahn from that experience it's not remotely clear to me that any amount of the technology getting better will produce something that can meaningfully help me with the actual programming problems I have

Show thread

Felix Neumann Jun 3, 2025

@jcoglan

Well, other people made different experiences.

I, with my admittedly tiny toy project https://github.com/fxnn/news I'm currently working on, using Aider and Gemini 2.5 Pro. I'm curious to explore (soon) how AI does with larger legacy code.

Then there's Aider, the tool which I'm using for AI augmented coding, I believe it usually has 80-90% of a release's code written by AI: https://github.com/Aider-AI/aider

Then recently one of Cloudflare's repos was discussed, whoch they developed using Claude: https://github.com/cloudflare/workers-oauth-provider

@alanpaxton @akahn

GitHub - fxnn/news: One simple feed from your newsletter mess.

One simple feed from your newsletter mess. Contribute to fxnn/news development by creating an account on GitHub.

GitHub

Show thread

jcoglan Jun 3, 2025

@fxnn @alanpaxton @akahn sorry if this is a dumb question, just asking b/c I see very different types of examples in the docs for these things -- do you typically give the AI very specific instructions about what to change, or give it something more open ended like "make this test pass" or "figure out why this bug is happening" or "optimise this code path"

Show thread

jcoglan Jun 3, 2025

@fxnn @alanpaxton @akahn c.f. the first example in aider's docs being "add this specific param name to this specific function" which is something automated refactoring has been able to do for decades

Show thread

Felix Neumann Jun 3, 2025

@jcoglan

By far no dumb question. Choosing the right step size is utterly important. With older / smaller models, you'll need to take pretty small steps to get useful results, like the "add this param" example you quoted. By far too small to be useful, if you ask me.

Then there's the people who came up with a really involved workflow, composed of multiple tools and prompts. I believe this one can be really powerful, but it just cries for automation and I don't want to feel like a trained monkey. But definitely have a look, this one is really interesting in terms of the prompts used: https://harper.blog/2025/02/16/my-llm-codegen-workflow-atm/

I'm mostly going for simpler prompts, but that's only possible with advanced models like Sonnet 3.7, o4-mini or (the one I'm using) Gemini 2.5 Pro. Real-world example from my history:

```
Add a separate mode to the application, in which it does not print the downloaded e-mails to stdout, but instead starts an HTTP server with a RESTful API.
We will add the actual API implementation later, some dummy plain-text response is enough for now.
```

This added the necessary commandline switches and the HTTP server boilerplate.

The next on was this:

```
Now we want `startHttpServer` to receive the same input as `processEmails`.
It shall also fetch e-mails, and then provide the aggregated stories from all e-mails under a `/stories` REST endpoint. As usual, start with proper unit tests.
```

And there I had my REST API.

Typically, I switch between such requirements-oriented prompts, and then more lower-level improvements/refactorings, making sure that the code quality and architecture is right, like this one:

```
In `main` func, handle `fetchAndSummarizeEmails` only once, for both `printEmails` and `startHttpServer`, because both need it in the same way. Also, handle errors directly, no need to pass them on to e.g. `startHttpServer`.
```

@alanpaxton

My LLM codegen workflow atm

A detailed walkthrough of my current workflow for using LLms to build software, from brainstorming through planning and execution.

Show thread

jcoglan Jun 3, 2025

@fxnn @alanpaxton @akahn there's no reason to assume that reasoning lies on a continuum from whatever it is LLMs do today, such that quantitative improvement will mean they will eventually be able to reason

my recent attempts to drive an LLM using TDD have gone extremely badly, including it fabricating its own tests and showing no ability to comprehend the problem or my instructions

Show thread

Felix Neumann Jun 3, 2025

@jcoglan

Yes, the steps we make shouldn't be too coarse, and also AI augmented coding is still somewhat adventurous. One of the important things it needs is a good "system prompt", some rules which tell it e.g. to never fix a test by deleting it.

I can recommend @kentbeck's recent blog posts on that matter, paywalled, e.g. https://open.substack.com/pub/tidyfirst/p/persistent-prompting

@alanpaxton @akahn

Persistent Prompting

Getting the genie to do good things to my code is often swamped by getting it not to do bad things.

Software Design: Tidy First?

Show thread

jcoglan Jun 3, 2025

@fxnn @kentbeck @alanpaxton @akahn I'm afraid I will not be giving substack my money, but if there are other sources that are worth reading on this subject I would value recommendations

Show thread

Felix Neumann Jun 3, 2025

@jcoglan

I can relate. While I get that people want to earn money with their blogs, I believe that there must be a better way than to feed Substack or Medium's pockets.

Anyways, the one good free blog on the general LLM topic (I'm reading regularly) is that from @simon, https://simonwillison.net.

Also I got recommended a blog post from https://harper.blog/posts/ which seems to feature some interesting AI coding posts lately.

Nothing more specific at the moment, unfortunately.

@alanpaxton

Simon Willison’s Weblog

Show thread

Paul Shryock Mar 7

@fxnn @alanpaxton @jcoglan @akahn "all we can do today" or we can ignore and not waste time on the failed technology that is all hype and zero substance.

Show thread

Felix Neumann Mar 7

@paulshryock

What makes you think that it's zero substance?

I see experienced developers every day applying tools like Claude Code to do the same task the developer would have done, in a fraction of the time, and often even in better quality.

I wonder what's "all hype" in such a technology.

Show thread

Swift Mar 7

@fxnn not engaging in the wider discussion here, but I'd seriously question "exponential growth" in this field. It had a single large step change, and has been very much incremental since. The improvements are linear at best, while the resources required to produce them are the only thing growing exponentially.

Like, I don't think this is even a particularly sceptical view - just facts on the ground. You can maybe discuss whether linear improvements have step change impacts, but the tech itself requires exponentially increasing resources to manifest linear improvements. That's why they've started switching to working on specialised models, reasoning, meta control systems, etc.

If you're relying on "we'll see continued improvements akin to Chat GPT's version 3 -> 5 trajectory", I think you'll be sorely disappointed.
@alanpaxton

Show thread

aburka 🫣Jun 2, 2025

@jcoglan they're using AI to do the code review too https://refactoring.fm/p/ai-code-reviews

How Code Reviews are Changing with AI 🔍

Reflections and predictions on the future of code reviews, taking inspiration from the success of CodeRabbit.

Refactoring

Show thread

Shafik Yaghmour Jun 2, 2025

@jcoglan

On of my theories when the first studies came out showing folks loved LLMs for coding but by a very high number distrusted the output was that they find doing code review more breezy and it makes them feel like they are supervising someone and superior and therefore they have the illusion of being more productive.

Kind of similar to how folks insisted multi-tasking was superior until all the studies came out showing objectively that their productivity plummeted regardless of how good it subjectively felt to them.

For a field in which we talk about data so much, we do a really bad job job of measuring productivity and factors that affect productively it shows.

Show thread

Paul Shryock Mar 7

@shafik @jcoglan yep, its all about that feeling of controlling "someone" else. Rooted in racism and white supremacy.

Show thread

Helge Wurst Jun 2, 2025

@jcoglan sorry for the literal screenshots, but check this out.

Wrote a library, then got asked to "put the problem into ChatGPT and see how close it gets to yours".

Well, it took me quite a long time to get the over-confident, defensive PoS to admit it f'ed up OCR* on a table.

(*kinda. I still don't know what exactly went wrong along the way, but since it barfs out and executes python code to get there, it might as well have screwed up some indices under the hood to mix up the table cells.)

Show thread

Helge Wurst Jun 2, 2025

@jcoglan ah well I think I see it now.. column index off by one

Having LLMs do tax reports and medical decisions will be SWEEET

Show thread

(roll m3tti)Mar 7

@jcoglan @MisterHW isnt there legal ai allready one from cloude. I love where this is going. Non deterministic output to deterministic systems thats so sweet. I love how we'll start debugging it. We should have psychaietrists for ai's

Show thread

John Maxwell Mar 7

@jcoglan Especially reviewing code from half-assed programmers.

Show thread

GinevraCat Mar 7

@jcoglan This is what is freaking me out - I need a job and did software development a while back but can't handle the thought of reviewing AI code.

At least when you do it for a colleague there's an aspect of upskilling for all involved. For AI, only the billionaires benefit.

Show thread

Janneke Mar 7

@jcoglan
There is no such thing as code review.

Show thread

jcoglan Mar 7

@janneke not sure I understand what you mean by this, can you elaborate?

Show thread

jon r Mar 7

@jcoglan
I'm also interested in learning more. In some metaphoric kind of Un-Code-Review way, as with unconferences, or as a matter of fact?
@janneke

Show thread

Janneke Mar 7

@yala @jcoglan
When I learnt programming in the 80s, I would always do that together with a friend. Having access to a computer was hard to get and you would share that time. After a brief stint as a lonely "professional" programmer in the 90s, that was "allowed" again but you had to call it pair programming.

I've been trying to avoid the ritual commonly known as code review (trckacr) for many reasons. I don't like it as a reviewer. I'm bad at it. I'm not a computer and something that looks plausible to me may not even compile. Cosmetic or stylistic errors/anomalies are easy, but reviews usually happen when the code is finished. Are you really going to suggest a full rewrite? A good friend of mine was a team lead at Google, and they lamented about how impossible code review sessions were and how producing quality code this way seems impossible.

It's not all black and white. I do find review comments helpful sometimes, especially when entering a new project. Usually that's about style or other cultural memes.

If you want to have (give or receive) truly impactful input from a second person on a piece of code, pair program it.

Anyway, I believe that the person who codes alone as opposed to pairing is cultivating the software crisis, and so is trckacr.

Show thread

Janneke Mar 7

@yala @jcoglan
So yeah, I do acknowledge the ritual exists, just not that it has much if anything to do with what it suggests that it is, or with what managers (non-programmers) belief it is.

Show thread

noplasticshower Mar 7

@jcoglan I helped to invent and commercialize static analysis engines for precisely that reason. So fun.

Show thread

A Mar 7

@jcoglan not even that. I also use LLMs for code reviews...

Show thread

Max S. New ⚜️Mar 7

@jcoglan and in open source LLMs can now automate the process of giving someone else horrible code review to do

@jcoglan Yeah! Especially when the reviewed code is just snippets copied from random sites on the internet! Fun!

Show thread

MongooseStudios Mar 8

@jcoglan and definitely never ever ever scrolls through half distractedly and says "LGTM"

Someone I know who is really into using LLMs to code — he uses one LLM to code, and another completely different LLM to code review.

He acts more like a technical lead or technical manager.