AI is making us write more code. That's the problem.

I analyzed research papers on AI-generated code quality. The findings:

→ 1.7x more issues than human-written code
→ 30-41% increase in technical debt
→ 39% increase in cognitive complexity
→ Initial speed gains disappear within a few months

We're building the wrong thing faster and calling it productivity.

The bottleneck was never writing code. It's understanding what to build.

If you're using AI coding tools, focus on:
• Smaller features (if it's 1000 lines, it's too big to review)
• Clear acceptance criteria before you prompt
• Tests first, AI-generated code second
• Security audits (AI can't do this)

More code isn't the goal. Solving real problems is.

AI-Generated Code Quality and the Challenges we all face

Research shows AI-generated code has 1.7x more issues than human code. Analysis of several studies reveals growing technical debt and complexity with GenAI tools.

Agile Pain Relief
@mlevison This talk on AI productivity from a Stanford researcher goes into detail on this. Really interesting and gels with what you're saying: the actual productivity gains tended to be with simple (low-complexity, greenfield) tasks rather than anything bigger or more complex. https://youtu.be/tbDDYKRFjhk?si=AM5DPcJGeg_3ignp
Does AI Actually Boost Developer Productivity? (100k Devs Study) - Yegor Denisov-Blanch, Stanford

YouTube

@mlevison I had a discussion with some developers about this last week. If we expect developers to take responsibility for the code they generate, the amount of work is capped by their ability to review code. And if we expect more (which is absolutely necessary, if we want the promised productivity gains), we can't blame them, if it goes wrong.

If we celebrate huge productivity gains, we give up the right to complain, if AI fails. And it will fail.

@weddige agreed and the evidence is that you can review 200-400 lines of code in an hour, with 300 lines being sane. Realistically, you can do that once or twice a day.
@mlevison this makes me wonder, it's then a viable way to do TDD? Humans authoring the tests, and AI implementing them.
@jose @mlevison but tdd in this context will only work if you can write _all_ the tests, covering _all_ the edge cases, including security and performance.
And if that was possible, which I honestly doubt, you'd still be building up a black box system with potential problems in terms of architecture and understanding

@imcdowall @jose Early days. We're still learning.

I mostly use: https://github.com/PaulDuvall/claude-code --- I use /xspec to create the core acceptance criteria. I read and edit. Then use /xtdd to attempt to write the code. I keep all changes small and only commit what I can read and understand.

@Mark Levison Right. it's getting hilarious, when you get AI to write code for audio coding applications. The machinery never understands, where you want to go, regardless of the detail you provide in your prompting. It's an endless back and forth, the code is often incorrect, and even more often not human readable or visible in applications, that have a procedural GUI. A complete time-waster actually.
Mark Levison (@[email protected])

7.26K Posts, 904 Following, 530 Followers · Certified Scrum Trainer | I help organizations and teams become more effective. I use #Scrum, #Kanban #Agile #BehaviouralPsychology. | I’ve helped over 8000 people build better teams I dabble with #PKM aka Personal Knowledgement Management so the ideas that fall out of my head get written down somewhere #Obsidian. #fedi22

Agile Alliance Mastodon
@jrp Interesting. I've never done Auido related development. I think i would fail harder than the GenAI tools.
@jrp @mlevison I had a student submit a vibe coded plugin that sounded correct, but wow, did it use a lot of system resources.

@celesteh

But those resources wanted to be consumed.

Debugging vibe code is entertaining.

@mlevison

Amen to that, brother…

@mlevison so true, it astounds me people didn't see this coming from the start. Also there's the cognitive deterioration double whammy.

@mlevison

That's my experience too.

Dave Farley's. MSE channe, which I usually respect,l recently claimed the opposite though, based on a study they took part in.

@oschonrock

As I've said to a few other people, the best teams with exceptional discipline may get better results.

In which case, I want data and I want measured by the CodeRabbit people so we have a basis for comparison.

@mlevison

Yeah I get the impression that the Dave Farley shop is quite professional and disciplined. Although they use Java and write mainly business / finance apps. Which are perhaps structurally relatively simple? They had a sample of 150 devs.

This is the video:

https://www.youtube.com/watch?v=b9EbCb5A408

Focus is on claude code and the maintainability by humans of code written by AI.

And this is the study:
https://arxiv.org/abs/2507.00788

We Studied 150 Developers Using AI (Here’s What's Actually Changed...)

YouTube
@oschonrock Street cred: The first author of the paper is associated with CodeScene; see, I cite one of their papers for students wanting to understand where to tackle technical debt/mess.

@mlevison not sure what you are saying..

That they have a good reputation for serious research?

@oschonrock well I wouldn’t cite them if they didn’t do solid research. I'm very selective about who I reference. So good++ ++ ++

@mlevison

Right. Their methodology seemed sane to me.

The results surprising. But as you say perhaps helped by some very disciplined Devs.

@oschonrock I highly recommend the book: "Your Code as a Crime Scene" - Adam Tornhill.

https://codescene.com/hubfs/web_docs/Business-impact-of-code-quality.pdf this is the paper that I recommend to Scrum Masters and Developers.

@oschonrock I just skimmed the paper the obvious difference seems to be around code base size. This paper seems to be focused on small tasks and code bases. I think the papers I'm citing are more focused on larger code bases.
@mlevison Goodhart's law in action. Actually, I'm not sure it even is an example of Goodhart's law. Raw quantity of code output would never have correlated strongly with quality. What do they think they're doing‽
@mlevison I use LLMs to help me with basic code writing tasks, generating the structural frameworks, saving me a lot of typing time. However, I never rely on that code out of the box, I always review it thoroughly and often just snip and prune. I would never attempt to give an LLM a complicated set of instructions, it's going to fail every time.
@mlevison Intellisense, pretti, etc. are all just tools for a smart developer.
@crackhappy @mlevison Jetbrains vanilla Intellisense was pretty good even before the latest epidemic of AI psychosis.
@thirstybear @mlevison I refuse to call what we currently have "Artificial Intelligence" because it is not. It's a fundamentally clever implementation of Markov chains with way way too much power applied.
@crackhappy Most of the time I call it GenAI. LLMS is a better choice, but I need to use the language of the audience. If I say LLM then I have to explain it, @thirstybear
@mlevison @thirstybear That's entirely valid, but thank you for putting the GenAI on the front. That makes it palatable for those Not In The Know.

@crackhappy @mlevison

Couldn't human made deterministic tools (or changes to programming languages) help with boilerplate work instead of indeterministic intransparent generative AI?

@crackhappy @mlevison

IIRC for some languages there also have been deterministic refactoring tools too that take over the tedious parts of refactorings (like "rename method" which exactly identified callers to adapt them).

@project1enigma @mlevison I think you're a bit in the weeds on this.

@crackhappy

What does that mean?

@project1enigma Overthinking it.

@crackhappy

Oh ok. Thanks but no thanks for being judgmental about my thinking process.

@project1enigma I apologize for offending you. That is not my intention.

@project1enigma these might help

The bigger issue is that GenAI has no judgment. No understanding of correctness; readability etc.

Better refactoring is great, but not enough.

Curious what languages are you referring to?

@mlevison

I personally work with C++ and am old fashioned and code with a text editor.

But the first time I read about refactoring tools, it was about the so called "refactoring browser" for Smalltalk.

https://wiki.c2.com/?RefactoringBrowser

@mlevison

IIRC nowadays there are tools for at least some of the usual refactoring steps in many IDEs for common programming languages. I'd be surprised if there were none for Java for example.

@mlevison

I'm personally also somewhat in favor of code generation for example for data marshalling/unmarshalling, parsing etc.

But that's deterministic specific generators then, either existing (lex, yacc and successors, for example), or in house/ad hoc.

@mlevison

Or using higher level languages where this can be done as libraries supporting domain specific languages instead.

(There is stuff like Boost.Spirit for C++, though that still feels less natural than say parser combinators for Haskell)

@project1enigma I do refactoring all the time. I still have a copy 1st edition of Refactoring - Martin Fowler.

They tools exist, but they don't necessarily help the genai tools. Maybe a Claude Plugin for refactoring?

@mlevison I'd specifically want to avoid those tools
@project1enigma OOP is exactly what you're describing. That was the tool, and still is the tool we're using to turn natural language coding into machine language. LLMs are just another layer on top of that, and a bad one.

@crackhappy

"LLMs are just another layer on top of that, and a bad one"

That's my point. After you told about using those for tedious tasks.

@mlevison I hear you. I am seeing it first hand. And I am being hammered for pointing this out and encouraging caution, while others are being rewarded for shipping 💩 “because AI”.

Just call me ‘Cassandra’ 🤷‍♂️

@mlevison
l suspect the line about the volume of code increase is conservative.
@ivanmorgillo What's your experience about this?
@mlevison Question : would have moonlanding been also a success if AI would have had control ? 😄
@mlevison We may remember the iconic foto of dear Margaret Hamilton stood beside a pile of papers of machine code for the AGC computer . What if AI would have programmed it ? A late competition would be enlightened about mankind’s biggest challenge last century 😇
@mlevison @Em0nM4stodon
→ 30-41% increase in technical debt
😱
@mlevison A problem of capitalism: it's always about *how much* you produce and consume, and rarely about *what* you produce. It seems that problem has now reached coding too.

@mlevison

Defining “productivity” as building the wrong thing faster has been the usual definition for at least my lifetime. Not just for code.

@mlevison can you post some links to those research papers?

In case you did not know: #acm's digital library is not open and free to use for everybody so most peer reviewed articles published in the us should be accessible. https://acm.org/dl

AI-Generated Code Quality and the Challenges we all face

Research shows AI-generated code has 1.7x more issues than human code. Analysis of several studies reveals growing technical debt and complexity with GenAI tools.

Agile Pain Relief

@mlevison The silver lining here (for developers who aren't using AI, despite demands from employers) is that we are easily able to keep up with developers who are.

My boss saw speed gains from some developers, and asked us all to. I kept going as normal, and after a bit of time for things to settle, I'm still faster.

I like to think that more and more people are realizing that the emperor has no clothes.

@mlevison My anecdotal guess is that people are operating above their own abilities.

If I understand a code base and its architecture, dependencies, business logic needs, etc. Then I can ask an AI to make a targeted change and verify it. It’s basically avoiding some typing.

But if I don’t understand those things I don’t actually know how to ask or how to verify it.

Even worse if I wouldn’t be able to write it in the first place, how can I ask for it and verify it?

Basically, how can I successful operate above my own knowledge and ability? It’s like asking for a piece of text written in a language I don’t speak. How can I have a clue what it says?

I think there are so uses for assisted development. I’m less sure an LLM is the way to achieve that.

Worst part, people aren’t learning anything from all this. There’s no skill and knowledge being built:(

Imho, ymmv, etc.

Also those are some numbers, yikes!