I saw this on Mastodon and almost had a stroke.
@davidgerard wrote:
âMost of the AI coding claims are conveniently nondisprovable. What studies there are show it not helping coding at all, or making it worse
But SO MANY LOUD ANECDOTES! Trust me my friend, I am the most efficient coder in the land now. No, you canât see it. No, I didnât measure. But if you donât believe me, you are clearly a fool.
These guys had one good experience with the bot, they got one-shotted, and now if you say âperhaps the bot is not all thatâ they act like youâre trying to take their cocaine away.â
First, the term is falsifiable, and proving propositions about algorithms (i.e., code) is part of what I do for a living. Mathematically human-written code and AI-written code can be tested, which means you can falsify propositions about them. You would test them the same way.
There is no intrinsic mathematical distinction between code written by a person and code produced by an AI system. In both cases, the result is a formal program made of logic and structure. In principle, the same testing techniques can be applied to each. If it were really nondisprovable, you could not test to see what is generated by a human and what is generated by AI. But you can test it. Studies have found that AI-generated code tends to exhibit a higher frequency of certain types of defects. So, reviewers and testers know what logic flaws and security weaknesses to look for. This would not be the case if it were nondisprovable.
You can study this from datasets where the source of the code is known. You can use open-source pull requests identified as AI-assisted versus those written without such tools. You then evaluate both groups using the same industry-standard analysis tools: static analyzers, complexity metrics, security scanners, and defect classification systems. These tools flag bugs, vulnerabilities, performance issues, and maintainability concerns. They do so in a consistent way across samples.
A widely cited analysis of 470 real pull requests reported that AI-generated contributions contained roughly 1.7 times as many issues on average as human-written ones. The difference included a higher number of critical and major defects. It also included more logic and security-related problems. Because these findings rely on standard measurement tools â counting defects, grading severity, and comparing issue rates â the results are grounded in observable data. Again, I am making a point here. Itâs testable and therefore disproveable.
This is a good paper that goes into it:
In this paper, we present a large-scale comparison of code authored by human developers and three state-of-the-art LLMs, i.e., ChatGPT, DeepSeek-Coder, and Qwen-Coder, on multiple dimensions of software quality: code defects, security vulnerabilities, and structural complexity. Our evaluation spans over 500k code samples in two widely used languages, Python and Java, classifying defects via Orthogonal Defect Classification and security vulnerabilities using the Common Weakness Enumeration. We find that AI-generated code is generally simpler and more repetitive, yet more prone to unused constructs and hardcoded debugging, while human-written code exhibits greater structural complexity and a higher concentration of maintainability issues. Notably, AI-generated code also contains more high-risk security vulnerabilities. These findings highlight the distinct defect profiles of AI- and human-authored code and underscore the need for specialized quality assurance practices in AI-assisted programming.
https://arxiv.org/abs/2508.21634
The big problem in discussions about AI in programming is the either-or thinking, when itâs not about using it everywhere or banning it entirely. Tools like AI have specific strengths and weaknesses. Saying âneverâ or âalwaysâ oversimplifies the issue and turns the narrative into propaganda that creates moral panic or shills AI. Itâs a bit like saying you shouldnât use a hammer just because itâs not good for brushing your teeth.
AI tends to produce code thatâs simple, often a bit repetitive, and very verbose. Itâs usually pretty easy to read and tweak. This helps with long-term maintenance. But AI doesnât reason about code the way an experienced developer does. It makes mistakes that a human wouldnât, potentially introducing security flaws. That doesnât mean we shouldnât use for where it works well, which is not everywhere.
AI works well for certain tasks, especially when the scope is narrow and the risk is low. Examples include generating boilerplate code, internal utilities, or prototypes. In these cases, the tradeoff is manageable. However, itâs not suitable for critical code like kernels, operating systems, compilers, or cryptographic libraries. A small mistake memory safety or privilege separation can lead to major failures. Problems with synchronization, pointer management, or access control can cause major problems, too.
Other areas where AI should not be used include memory allocation handling, scheduling, process isolation, or device drivers. A lot of that depends on implicit assumptions in the systemâs architecture. Generative models donât grasp these nuances. Instead of carefully considering the design, AI tends to replicate code patterns that seem statistically likely, doing so without understanding the purpose behind them.
Yes, Iâm aware that Microsoft is using AI to write code everywhere I said it should not be used. That is the problem. However, political pundits, lobbyists, and anti-tech talking heads are discussing something they have no understanding of and arenât specifying what the problem actually is. This means they canât possibly lead grassroots initiatives into actual laws that specify where AI should not be used, which is why we have this weird astroturfing bullshit.
Theyâre taking advantage of the reaction to Microsoft using AI-generated code where it shouldnât be used to argue that AI shouldnât be used anywhere at all in any generative context. AI is useful for tasks like writing documentation, generating tests, suggesting code improvements, or brainstorming alternative approaches. These ideas should then be thoroughly vetted by human developers.
Something Iâve started to notice about a lot of the content on social media platforms is that most of the posts people are liking, sharing, and memetically mutatingâand then spreading virallyâusually donât include any citations, sources, or receipts. Itâs often just some out-of-context screenshot with no reference link or actual sources.
A lot of the anti-AI content is not genuine critique. Itâs often misinformation, but people who hate AI donât question it or ask for sources because it aligns with their biases. The propaganda on social media has gotten so bad that anything other than heavily curated and vetted feeds is pretty much useless, and itâs filled with all sorts of memetic contagions with nasty hooks that are optimized for you algorithmically. I am at the point where I will disregard anything that is not followed up with a source. Period. It is all optimized to persuade, coerce, or piss you off. I am only writing about this because this Iâm actually able to contribute genuine information about the topic.
That they said symbolic propositions written by AI agents (i.e., code) are non-disprovable because they were written by AI boggles my mind. Itâs like saying that an article written in English by AI is not English because AI generated it. It might be a bad piece of text, but itâs syntactically, semantically, and grammatically English.
Basically, any string of data can be represented in a base-2 system, where it can be interpreted as bits (0s and 1s). Those bits can be used as the basis for symbolic reasoning. In formal propositional logic, a proposition is a sequence of symbols constructed according to strict syntax rules (atomic variables plus logical connectives). Under a given semantics, it is assigned exactly one truth value (true or false) in a two-valued logic system.
They are essentially saying that code written by AI is not binary, isnât symbolically logical at all, and cannot be evaluated as true or false by implying it is nondisproveable. At the lowest level, compiled code consists of binary machine instructions that a processor executes. At higher levels, source code is written in symbolic syntax that humans and tools use to express logic and structure. You can also translate parts of code into formal logic expressions. For example, conditions and assertions in a program can be modeled as Boolean formulas. Tools like SAT/SMT solvers or symbolic execution engines check those formulas for satisfiability or correctness. It blows my mind how confidently people talk about things they do not understand.
Furthermore that they donât realize the projection is wild to me.
@davidgerard wrote:
âBut SO MANY LOUD ANECDOTES! Trust me my friend, I am the most efficient coder in the land now. No, you canât see it. No, I didnât measure. But if you donât believe me, you are clearly a fool.â
They are presenting a storyâi.e., saying that the studies are not disprovableâand accusing computer scientists of using anecdotal evidence without actually providing evidence to support this, while expecting people to take it prima facie. Youâre doing what you are accusing others of doing.
It comes down to this: they feel that people ought not to use AI, so they are tacitly committed to a future in which people do not use AI. For example, a major argument against AI is the damage it is doing to resources, which is driving up the prices of computer components, as well as the ecological harm it causes. They feel justified in lying and misinforming others if it achieves the outcome they wantâpeople not using AI because it is bad for the environment. That is a very strong point, but most people donât care about that, which is why they lie about things people would care about.
Itâs corrupt. And whatâs really scary is that people donât recognize when they are part of corruption or a corrupt conspiracy to misinform. Well, they recognize it when they see the other side doing it, that is. No one is more dangerous than people who feel righteous in what they are doing.
Itâs wild to me that the idea that if you cannot persuade someone, it is okay to bully, coerce, harass them, or spread misinformation to get what you wantâbecause your side is rightâhas become so normalized on the Internet that people canât see why it is problematic.
That people think it is okay to hurt others to get them to agree is the most disturbing part of all of this. People have become so hateful. That is a large reason why I donât interact with people on social media, really consume things from social media, or respond on social media and am writing a blog post about it instead of engaging with who prompted it.

Human-Written vs. AI-Generated Code: A Large-Scale Study of Defects, Vulnerabilities, and Complexity
As AI code assistants become increasingly integrated into software development workflows, understanding how their code compares to human-written programs is critical for ensuring reliability, maintainability, and security. In this paper, we present a large-scale comparison of code authored by human developers and three state-of-the-art LLMs, i.e., ChatGPT, DeepSeek-Coder, and Qwen-Coder, on multiple dimensions of software quality: code defects, security vulnerabilities, and structural complexity. Our evaluation spans over 500k code samples in two widely used languages, Python and Java, classifying defects via Orthogonal Defect Classification and security vulnerabilities using the Common Weakness Enumeration. We find that AI-generated code is generally simpler and more repetitive, yet more prone to unused constructs and hardcoded debugging, while human-written code exhibits greater structural complexity and a higher concentration of maintainability issues. Notably, AI-generated code also contains more high-risk security vulnerabilities. These findings highlight the distinct defect profiles of AI- and human-authored code and underscore the need for specialized quality assurance practices in AI-assisted programming.


