As a research project, I built a needed tool with Claude Code. I thought it would be a disaster, but it wasn't. I have some complicated feelings about it.

https://taggart-tech.com/reckoning/

I used AI. It worked. I hated it.

I used Claude Code to build a tool I needed. It worked great, but I was miserable. I need to reckon with what it means.

I really appreciate all the replies and support on this one. It was hard to write. I do want to call out two points that aren't being discussed, and that I felt pretty strongly about:

  • Open source is in trouble, and maintainers need help. Generative code is the help that showed up. What is the expectation here?
  • "The tool requires expertise to validate, but its use diminishes expertise and stunts its growth." What does "responsible use" look like that prevents this obvious and pervasive harm?
  • @mttaggart I really appreciate your insight, I'm going to be asking my boss a lot of these questions.

    @mttaggart
    Actually, from my personal experience (CC too, which is probably still one of the better AI coding agents even if it has many warts), yes it "works", but before you start to announce your great successes with it, don't forget so ugly details that people like to overlook.

    The human aspect. On one side you need an experienced overseer that makes sure that CC stays on track. I've seen CC go on many fascinating off topic excursions.

    @mttaggart
    And the other human aspect is the human competition.

    It's known since over half a century that the difference between efficient and non efficient developers is over a magnitude (actually the 1968 article gives a 10-28 range depending how you measure the data), efficiency being defined in time to deliver a working program from starting to with the spec.

    Later research lowered that a bit for the means of the top and bottom groups but the extent outliers are still in similar ranges.

    @mttaggart
    Funny thing I had a long talk with my former CTO who has a funny way of hiring developers in the early years (before corporate HR took over). We literally had only seniors. And even these seniors were hand picked. Often hired with missing skills for the roles.
    Completely accepting that they would need to learn a language or even relearn from being Windows focused to be now Linux focused. Funny these gals and guys usually managed that in surprising short time.
    @mttaggart
    Or as we summarised last week in our talk, hand picked highly effective developers earn perhaps at worst 50% than average commercial developers if that. But can deliver if needed up to 10 times more productivity and better quality. Secret why we always had open positions and why for the first time in my life felt "intimidated" mentally by my colleagues.

    @mttaggart
    But you probably wonder where the punchline is in relation to AI, my dear colleague and CEO was very excited that he managed to rewrite our MVP prototype with a much better architecture in 7-8 weeks with AI help after estimating that with classical human teams that would have taken 18 man months or so.

    Now acting as his official spoilsport (it's somewhere in my CTO contract that this one of my duties) I had to point out that he's one of highly

    efficient coding junkies, the topic of the MVP is in the area of his core competencies that he can literally write scientific papers about, and yes if you divide 18 months by ten (don't forget doing it by himself there is also no so things like team overhead), the huge speedup he attributed to Mr Sonnet can be explained by trivial software engineering research known for half a century.

    @mttaggart

    @mttaggart
    But yes "AI" does change the profession.

    IMHO, so coding agents, especially if you add all the guard rails to make them safe work almost certainly slower than a human expert working in the core of his/her expertise.

    But eg when I move in the areas where I have to start looking up libraries (JavaScript, TS) LLMs suddenly start to show their capabilities in speed reading.

    @mttaggart
    And the other aspect is that LLM are simply (despite the risk of errors, but nearly all real world algorithms have failure modes, live with that) a milestone in NLP. Especially in multilingual NLP.

    So yes you need to design your processes with the possibility of errors in mind.

    Good developers learn error handling in kindergarten.

    At least our algorithm and data classes literally test in the scoring unit tests also for error handling. Dealing with the correct results is easy.

    @yacc143 Did you bother to read my thing before replybombing?

    @mttaggart Responsible use is maintaining your role as the expert at all stages of the project. AI is a tremendous tool but has been made so easy to misuse by making it entertaining and our little code "buddy". You really hit many of what I consider responsible AI practices. Two of my main rules...

    Review everything. If you don't understand the AI's code, the subject matter, references, etc don't commit until you do. Don't ever accept auto-commit.

    Security is key. I can't stress that enough to new devs. If you didn't tell your agent to make it secure - it's not. Start your security audit.

    @johnofrobotz Even if you did tell your agent to make it secure.
    @mttaggart hehe so true. That's why the first rule is so important 😁
    @mttaggart A great read, thanks. I’m someone who instinctively knows I’ll hate shepherding an LLM, and who is aghast at the thievery and environmental vandalism required to create them. I also viscerally fear becoming dependent on big tech and eroding my own expertise to do the one thing I’m any good at. But I’m learning not to judge as harshly those who feel they need to use them, even if I think it’s a road to ruin intellectually and financially (as they become more expensive)

    @mttaggart Nice post. Yeah, the tipping point from these coding assistant creating slop, to being usable is a fairly recent thing. I'm not a coder, I'm a security engineer. So I'm used to handing over trust to a tool or SaaS service. Guardrails and layered controls are the key.

    I think the skills we're learning right with how to make coding assistant write good code is a marketable skill. I feel like were back in the early days of the cloud learning a new skill that's cutting edge.

    Anywho, just wanted to say I enjoyed reading your blog post. I too am struggling with all the complexities and externalities of AI.

    @Xavier Thank you for reading, and for struggling!
    @mttaggart ah. That’s what the vaguetoot was about.
    @winterknight1337 Yep. Bracing for the fallout.
    @mttaggart good write up man.
    @winterknight1337 Thanks, friend. Most appreciated.
    @mttaggart This fits what I’ve seen at $dayjob recently where talented and experienced people manage to sometimes get good use out of these tools (although with fewer ethical doubts than you describe). I’m mostly worried about problems caused by folks who don’t care or don’t know any better.
    Successful use cases will also make it more difficult to argue against LLM use for those of us who don’t want to use them due to ethical reasons. I’m not looking forward to that.

    @zaicurity Exactly. For the carelessness, I don't think a tool absolves one of carelessness, but I do think this tool in particular—at least in the way it is implemented now—makes carelessness not only easy, but highly incentivized. Without a dizzying array of external guardrails, harmful mistakes will occur. A bit more friction in the creation might go a long way. But alas, that would not be a popular product.

    And yeah, people should have a right to opt out of using these things for ethical reasons, but I do think examining those objections closely is worthwhile, if only to strengthen them.

    @mttaggart something I have been thinking recently, and which chimes a bit with your ultimate conclusion, is that I think of AI users a lot like smokers. 1/2
    @mttaggart E.g. a) I think it is generally bad for their health (smoking literally, AI in terms of cognitive skills) in the long term, although some will get away with it. b) lots of people using them will be collectively bad for society as these costs compound. c) an individual using it doesn't make them a bad person (although I would encourage them not to). d) pushing (be that tobacco or AI) it on the other hand does demonstrate some sort of moral failing. 2/2
    @smilingdemon That feels mostly correct, and the addictive properties align as well. I am wary of too-simple parallels, but this is close to a line of thinking I'm pursuing.
    @mttaggart its not perfect (analogies never are), but it's the easiest way of giving a concise explanation of my feelings I can think of.
    @mttaggart my criteria for using llms for code generation at work:
    1. Internal only tool
    2. Doesn't involve new ideas, just involves implementing well known design patterns
    3. Doesn't directly affect anything critical
    4. I could do it, and have a detailed idea of how I would implement it
    5. I have a good understanding of the necessary tests and edge cases that would verify the generated code
    6. I don't have the time available to set aside for implementing it in the next 6 months

    @mttaggart I have read only the self-flagellation so far and can I just say: oof.

    my own co-skeptic feeling here is that I am deeply sympathetic to what you’re trying to do here and also I am furious with your employer (or maybe just the ecosystem more generally) effectively forcing you to take a bunch of risks with this

    @mttaggart okay, read the whole thing now. I wouldn't have phrased the "purity" section at the end in quite the same way you did, but it didn't raise my hackles in quite the same way Doctorow did with the same point. "I am tired of running from one corner of technology to the next" resonated hard enough to rattle my teeth
    @glyph I struggled with that section a lot, but I think it's demonstrably true that we spend more time tearing each other down than building each other up, and in so doing we give the victory to our adversaries.
    @glyph I'm curious about why you have reservations about the purity section, or, to put it another way, why it apparently did raise your hackles to some extent. @mttaggart
    @matt @mttaggart "ideological purity" is a bit of a loaded phrase. While I'm sympathetic to the *sentiment*, I don't think it's true that "purity is a weapon used to divide labor against each other"; the thing that was used to divide labor against each other was racism. Now… purity does come into that, because once a bunch of racists are wandering around your movement, you've got difficult choices to make about how you maintain your coalition.
    @matt @mttaggart so, like, you could argue that it's "purity testing" to say that racists are unwelcome in your movement, and that we can't fight amongst "ourselves", except that the opposite of that is to welcome racists into the coalition and now it's just a coalition of racists because the racists are going to chase all the minorities out
    @matt @mttaggart there's a very delicate line to walk where you don't "purity test" casual racists by being super aggressive to them, but instead you make it clear that while *they* are welcome, their *racism* isn't welcome, so you can try to rehabilitate the casual rubes while aggressively excluding the heartfelt bigots. it's kind of impossible, which is why I am more sympathetic to this sentiment than to other recent formulations of this problem.

    @glyph @matt So, this is probably the most misunderstood part of the piece, and that's on me. I am concerned about ideological purity in this context. Purity as a concept, whether ideological or otherwise (i.e. racial), is what I was calling dangerous. And racism, among many other things, is a derangement that weaponizes purity. This is an instrument capitalists used heavily throughout the latter 19th and early 20th centuries to disrupt labor movements and prevent workers of different races from finding common cause. That's not to say racism wasn't elsewhere or sourced from within all socioeconomic echelons. Even so, the weaponization and exacerbation is relevant. Purity is a way to pit people against each other.

    Ideological purity, less dangerous than racism, still prevents finding common cause. Building movements requires working with those who do not agree with you on everything. There are lines we cannot cross to be sure, but we must be vigilant to prevent those lines from excluding all but exact matches to our own beliefs. This is the challenge, and one we are not meeting.

    Am I a fascist for having used Claude Code and paying $20 to test it as others have? Some will say I am, or adjacent, because I have used a fascist tool. I find this deeply unhelpful to anyone. And that's my point. If you demonize anyone who touches this technology, your opposition movement is doomed to failure.

    What do we want to accomplish? Stopping or stemming the spread of the disease, or building a commune of the untouched?

    I also had questions about this section. While I read, I wondered how you thought of maintaining a social unit of any kind without censure and expulsion in some cases. And that's not a trick question btw; maybe you havr some ideas.

    Or to pose the question to this toot, what makes you think purity itself is the problem, as opposed to a fixation on it? Compare with money, which isn't itself evil, but an over-veneration of it can ruin a person.

    @mttaggart @glyph @matt

    @dogfox @glyph @matt To the first:

    There are lines we cannot cross to be sure, but we must be vigilant to prevent those lines from excluding all but exact matches to our own beliefs. This is the challenge, and one we are not meeting.

    Once again, I am making no case for a lack of boundaries. I am making the case that the boundaries currently in play are counterproductive.

    I can not and will not give you a maxim for establishing them. Looking for empiricism there is where you get into weird inversions of moral obligation.

    As for obsession versus the thing itself, I see a distinction without a difference. To maintain "purity" as a virtue is to seek it, and without clarity that it is unattainable, you end up with some version of obsession. I would prefer a heuristic of growth and estimation of intent. Not perfect metrics, and deeply subjective. It's something best done in human relations, and not conducive to a few hundred characters of pith.

    I get what you're saying a lot better now. Thank you.

    Strength of agreement isn't the same as purity. Purity also insists on completeness of agreement with predefined doctrine, if i am reading right.

    In that case, I think I agree with you that that is always pathological.

    @mttaggart @glyph @matt

    @mttaggart @matt This all wasn't in the text, but, it was sort of *implied* by the way you were bringing up purity and the references you were gesturing at, which is precisely why I said it *didn't* "raise my hackles" (I probably would have chosen a different phrase if I knew I'd have to repeat it 30 times).

    It's extremely difficult to talk about not least because there are so many using "purity testing" *as* a purity test, and as cover for just telling people to shut up and accept odious views

    @mttaggart @matt I think the main thing i would have changed is to talk specifically about where your line is, and what you feel like constitutes "purity". I think it's unambiguously OK to pay $20 (or even a few hundred dollars) to get accurate data on capabilities so that we can talk about this stuff in a factual, honest, and, thus, hopefully convincing way. I think it's OK to be a hype monster for a bit and change your mind. Honestly I don't even know where my line for "not OK" really starts
    @mttaggart @matt like, I know and respect people who just fundamentally disagree with me about the ethical implications LLMs completely, and I feel like this is an area where I can tolerate a lot more wiggle room than, say, racism. I trust their judgement a lot less than I used to, and I don't really have much capacity to sanction them, but I'm not sure how much I would even if I could

    @glyph I guess I see the professional side of it this way. I could:

  • Quit, which harms everyone involved and solves nothing.
  • Say nothing, which harms anyone impacted by dangerous AI.
  • Do what I'm doing, and hope to mitigate harm.
  • The choice is clear, and I'd much rather that I be the one talking about AI security than a myopic booster of the tech.

    @mttaggart oh yeah, for sure. and even given risks+externalities accounted for, this type of work (i.e. the investigation in the post itself) needs to get done. and it's not worth much if it doesn't get done by someone with your priors and methodological constraints, which is to say, someone who it will personally hurt. so, (unironically) thank you for your service here
    @mttaggart I am still left wondering, per https://blog.glyph.im/2025/08/futzing-fraction.html , if overall you felt like your experience here mitigated my ongoing concern that despite "appearing to work" on small-scale tools like this, the larger risks still mean that it may be a net negative, even just straightforwardly to productivity, when deployed at scale
    The Futzing Fraction

    At least some of your time with genAI will be spent just kind of… futzing with it.

    @glyph I hope I was clear that I still find the technology's harms outweigh its benefits. That would be true even if it produced perfect code every time, and that simply isn't the case.

    What I discovered here is that, in limited use cases, the probability of error can decrease significantly, and the real time investment to build a working and secure product diminishes. That said, a lot of things need to go right, and every single process to keep the model on track is prone to failure. Also, context (in the model's sense) really matters. This project was small enough that the requisite context was almost always available to the model, or it was primed with external sources to make it available. Deployed against a much larger codebase, you'd need proportionally more computing resources to do likewise, and again your probability for error increases.

    So yeah, still not great. I found a way to make it work, but doing so sucked ass.

    I also wasn't kidding about Rust as basically a requirement. I would never in a million years attempt this with Python—which I love, by the way. But even with live LSP linting, the average Python code quality in the model's training corpora is going to affect output, and without the compile-time checks of Rust, I'd be very worried about hidden dragons.

    @glyph Oh, one other point. I think the FF model might need a corollary for coding agents. Per-inference calculations don't really make sense in this workflow. Instead it would be more beneficial to think about time/usage per feature or commit or something. And yeah, by those metrics, this was phenomenally faster than what I would have done myself, and thanks to careful scaffolding, solid on the other concerns as well. By the numbers, this application was an unequivocal win. Just, y'know, an icky one.
    @mttaggart yeah "inference" is a highly abstract factor in FF, the idea was not to literally describe an individual path through the model and so I may have abused the term. if you're checking per-diff-hunk then the "inference" is the diff hunk and the price should be calculated that way

    @mttaggart

    Thanks for this post. I've been an extreme skeptic of LLMs but seeing increasingly promising results for agentic coding. I'm not sure I'm on board with using it as regular practice but increasingly seeing the need to experiment with it to better understand how it'll impact my job, and provide better informed opinions on it.

    @mttaggart That was a thin line to walk. Nicely done. Very even keeled.

    I’ve experienced the echos of what you wrote in my day job too. My team has decided it’s a tool, it will be used, and it will never be trusted. But writing SOC PowerShell scripts and KQL alerts isn’t the same as an ERP. I do not envy that team.

    As for the place of agentics in a future world, I look to the automobile and the airplane as relevant comparisons. The modern world could not exist without either. I don’t think the next future can exist without neural nets and other agentics. The jury is still out for me on LLMs.

    But the automobile and the airplane have wrecked this planet environmentally, though ther inventors knew not that would happen. We know better now. Has our species matured enough to not make the same mistake? Perhaps. But not everyone has. And they turned on the hard sell big time trying to get filthy rich from so called “AI” hoping no one would question it.

    Fortunately (?) it’s starting to look like the economics are not sustainable. As evidence I posit that Nadella may not have a job for much longer.

    Perhaps we’ll collectively get a moment to pause, reassess and reset. Less hype and a more considered approach to this tech is gravely needed. That worked after the Internet bubble of the 1990s burst. It’ll work here if we can get the more even keeled among us to take charge. May that happen soon.

    @mttaggart Great. Sounds so familiar. I just tried repeating a git process which I knew worked because I did it yesterday using Claude as a help. Today Claude was completely off target. I had to correct it on major things many times. Was a bit shocked but....shows that one can learn from using it ....how else could I have corrected it the second time?
    @mttaggart Great write up. I agree that there's a lot of nuance. Those solely using AI as a means for dividing the population into "ethical" and "unethical" groups, while stating "AI is a bubble", are not going to create the change they want to see. Balanced takes that insist on using tools as safely (and, ideally, as ethically) as possible are how you cross the divide.

    @mttaggart

    Wow, thanks for taking the time to write out your experience so completely. I think I’d have a similar complex reaction.

    Recently developed a large multi threaded python program implementing a complex PID controlled system with lots of realtime IO from sensors and encoders and to actuators along with weak test coverage. Kids interacted with it at a science museum.

    1/n

    @mttaggart

    Late in the project I was encouraged to try using Claude for some refactoring and investigating/fixing some bugs related to new features.

    The code changes made some sense … but didn’t fit well with the existing complexity.

    I had a relatively extreme cognitive allergic response which surprised me. The whole system was in my head and it felt like I was losing control. Backed out most of the code.

    In hindsight this was NOT a good way to start using AI.

    2/2

    @mttaggart
    Thanks for the write-up. An interesting if tricky read. I find myself in a similar place. If we don’t become knowledgeable about these tools, able to map out failure modes and boundaries, and identify how they should be delimited and defanged, we effectively cede the floor to evangelists who won’t know or care.

    @mttaggart a really thoughtful and interesting post I find matching my own thoughts on a lot of this */me gestures broadly*

    So thanks for posting

    @mttaggart That's all pretty consistent with my experience. I've spent some time learning the tools and the underlying theory, and from earlier work I also know a bit about chips and the stack between chat and chips. On the one hand there are some major limits to how well the average person can describe the problems the want to solve. On the other I think the people building their thinking on "it's a bubble" miss that the tech is not fundamentally expensive to maintain once built.
    @mttaggart Consequently, however violently the tide recedes a lot of this stuff is still going to be around and in use for the foreseeable future. How we deal with the labor issues, the brain-cooking effect chatbots have on many people, etc, I'm not sure.