I'm pretty satisfied at this point that the true extent of "AI" use in software development has been massively exaggerated.

Sure, lots of devs are using LLMs. But there seems to be very little advanced use. It's mostly chat window stuff and occasional inline completion, to keep the boss happy.

It is, however, a massive distraction.

Developers have a significant incentive to exaggerate their use of the technology.

The story I keep hearing from devs is that they tried doing the agentic stuff, looked at the results, and recoiled in horror.

And while there's no shortage of people trying to get it to work at scale, I've seen no credible evidence that *anybody* has cracked it.

So big claims, for sure. But backed up by nothing.

The available evidence, and the best research, suggests that unattended agentic execution is *uncrackable*, too. LLMs will never be reliable enough, and quality gates will never be complete enough to take the human out of the loop to any significant extent.

It's a Fool's Errand, IMO.

@jasongorman I’ve been waiting to find studies following the METR one from last summer that found productivity claims were illusory, but conducted since the supposed sea change in November. Have you seen any?

@sakhavi I've been using closed-loop tests to measure improvements in new models and AI coding tools, and I saw no significant improvement in the last 6 months in task completion rates.

If there was a sea change, it wasn't in the technology.

@sakhavi @jasongorman There's a follow up from METR from February:

https://metr.org/blog/2026-02-24-uplift-update/

But lots of caveats too.
(I'd especially be interested in how it impacts development on the same project over time, giving the studies showing impact on knowledge retention/skill atrophy)

We are Changing our Developer Productivity Experiment Design

@jasongorman The problem is that it's good enough for greenfield code that is "90% complete"

I haven't seen any LLM generated code maintenance on long term projects