As a research project, I built a needed tool with Claude Code. I thought it would be a disaster, but it wasn't. I have some complicated feelings about it.
As a research project, I built a needed tool with Claude Code. I thought it would be a disaster, but it wasn't. I have some complicated feelings about it.
I really appreciate all the replies and support on this one. It was hard to write. I do want to call out two points that aren't being discussed, and that I felt pretty strongly about:
@mttaggart
Actually, from my personal experience (CC too, which is probably still one of the better AI coding agents even if it has many warts), yes it "works", but before you start to announce your great successes with it, don't forget so ugly details that people like to overlook.
The human aspect. On one side you need an experienced overseer that makes sure that CC stays on track. I've seen CC go on many fascinating off topic excursions.
@mttaggart
And the other human aspect is the human competition.
It's known since over half a century that the difference between efficient and non efficient developers is over a magnitude (actually the 1968 article gives a 10-28 range depending how you measure the data), efficiency being defined in time to deliver a working program from starting to with the spec.
Later research lowered that a bit for the means of the top and bottom groups but the extent outliers are still in similar ranges.
@mttaggart
But you probably wonder where the punchline is in relation to AI, my dear colleague and CEO was very excited that he managed to rewrite our MVP prototype with a much better architecture in 7-8 weeks with AI help after estimating that with classical human teams that would have taken 18 man months or so.
Now acting as his official spoilsport (it's somewhere in my CTO contract that this one of my duties) I had to point out that he's one of highly
efficient coding junkies, the topic of the MVP is in the area of his core competencies that he can literally write scientific papers about, and yes if you divide 18 months by ten (don't forget doing it by himself there is also no so things like team overhead), the huge speedup he attributed to Mr Sonnet can be explained by trivial software engineering research known for half a century.
@mttaggart
But yes "AI" does change the profession.
IMHO, so coding agents, especially if you add all the guard rails to make them safe work almost certainly slower than a human expert working in the core of his/her expertise.
But eg when I move in the areas where I have to start looking up libraries (JavaScript, TS) LLMs suddenly start to show their capabilities in speed reading.
@mttaggart
And the other aspect is that LLM are simply (despite the risk of errors, but nearly all real world algorithms have failure modes, live with that) a milestone in NLP. Especially in multilingual NLP.
So yes you need to design your processes with the possibility of errors in mind.
Good developers learn error handling in kindergarten.
At least our algorithm and data classes literally test in the scoring unit tests also for error handling. Dealing with the correct results is easy.
@mttaggart Responsible use is maintaining your role as the expert at all stages of the project. AI is a tremendous tool but has been made so easy to misuse by making it entertaining and our little code "buddy". You really hit many of what I consider responsible AI practices. Two of my main rules...
Review everything. If you don't understand the AI's code, the subject matter, references, etc don't commit until you do. Don't ever accept auto-commit.
Security is key. I can't stress that enough to new devs. If you didn't tell your agent to make it secure - it's not. Start your security audit.
@mttaggart Nice post. Yeah, the tipping point from these coding assistant creating slop, to being usable is a fairly recent thing. I'm not a coder, I'm a security engineer. So I'm used to handing over trust to a tool or SaaS service. Guardrails and layered controls are the key.
I think the skills we're learning right with how to make coding assistant write good code is a marketable skill. I feel like were back in the early days of the cloud learning a new skill that's cutting edge.
Anywho, just wanted to say I enjoyed reading your blog post. I too am struggling with all the complexities and externalities of AI.
@zaicurity Exactly. For the carelessness, I don't think a tool absolves one of carelessness, but I do think this tool in particular—at least in the way it is implemented now—makes carelessness not only easy, but highly incentivized. Without a dizzying array of external guardrails, harmful mistakes will occur. A bit more friction in the creation might go a long way. But alas, that would not be a popular product.
And yeah, people should have a right to opt out of using these things for ethical reasons, but I do think examining those objections closely is worthwhile, if only to strengthen them.
@mttaggart I have read only the self-flagellation so far and can I just say: oof.
my own co-skeptic feeling here is that I am deeply sympathetic to what you’re trying to do here and also I am furious with your employer (or maybe just the ecosystem more generally) effectively forcing you to take a bunch of risks with this
@glyph @matt So, this is probably the most misunderstood part of the piece, and that's on me. I am concerned about ideological purity in this context. Purity as a concept, whether ideological or otherwise (i.e. racial), is what I was calling dangerous. And racism, among many other things, is a derangement that weaponizes purity. This is an instrument capitalists used heavily throughout the latter 19th and early 20th centuries to disrupt labor movements and prevent workers of different races from finding common cause. That's not to say racism wasn't elsewhere or sourced from within all socioeconomic echelons. Even so, the weaponization and exacerbation is relevant. Purity is a way to pit people against each other.
Ideological purity, less dangerous than racism, still prevents finding common cause. Building movements requires working with those who do not agree with you on everything. There are lines we cannot cross to be sure, but we must be vigilant to prevent those lines from excluding all but exact matches to our own beliefs. This is the challenge, and one we are not meeting.
Am I a fascist for having used Claude Code and paying $20 to test it as others have? Some will say I am, or adjacent, because I have used a fascist tool. I find this deeply unhelpful to anyone. And that's my point. If you demonize anyone who touches this technology, your opposition movement is doomed to failure.
What do we want to accomplish? Stopping or stemming the spread of the disease, or building a commune of the untouched?
I also had questions about this section. While I read, I wondered how you thought of maintaining a social unit of any kind without censure and expulsion in some cases. And that's not a trick question btw; maybe you havr some ideas.
Or to pose the question to this toot, what makes you think purity itself is the problem, as opposed to a fixation on it? Compare with money, which isn't itself evil, but an over-veneration of it can ruin a person.
@dogfox @glyph @matt To the first:
There are lines we cannot cross to be sure, but we must be vigilant to prevent those lines from excluding all but exact matches to our own beliefs. This is the challenge, and one we are not meeting.
Once again, I am making no case for a lack of boundaries. I am making the case that the boundaries currently in play are counterproductive.
I can not and will not give you a maxim for establishing them. Looking for empiricism there is where you get into weird inversions of moral obligation.
As for obsession versus the thing itself, I see a distinction without a difference. To maintain "purity" as a virtue is to seek it, and without clarity that it is unattainable, you end up with some version of obsession. I would prefer a heuristic of growth and estimation of intent. Not perfect metrics, and deeply subjective. It's something best done in human relations, and not conducive to a few hundred characters of pith.
I get what you're saying a lot better now. Thank you.
Strength of agreement isn't the same as purity. Purity also insists on completeness of agreement with predefined doctrine, if i am reading right.
In that case, I think I agree with you that that is always pathological.
@mttaggart @matt This all wasn't in the text, but, it was sort of *implied* by the way you were bringing up purity and the references you were gesturing at, which is precisely why I said it *didn't* "raise my hackles" (I probably would have chosen a different phrase if I knew I'd have to repeat it 30 times).
It's extremely difficult to talk about not least because there are so many using "purity testing" *as* a purity test, and as cover for just telling people to shut up and accept odious views
@glyph I guess I see the professional side of it this way. I could:
The choice is clear, and I'd much rather that I be the one talking about AI security than a myopic booster of the tech.
@glyph I hope I was clear that I still find the technology's harms outweigh its benefits. That would be true even if it produced perfect code every time, and that simply isn't the case.
What I discovered here is that, in limited use cases, the probability of error can decrease significantly, and the real time investment to build a working and secure product diminishes. That said, a lot of things need to go right, and every single process to keep the model on track is prone to failure. Also, context (in the model's sense) really matters. This project was small enough that the requisite context was almost always available to the model, or it was primed with external sources to make it available. Deployed against a much larger codebase, you'd need proportionally more computing resources to do likewise, and again your probability for error increases.
So yeah, still not great. I found a way to make it work, but doing so sucked ass.
I also wasn't kidding about Rust as basically a requirement. I would never in a million years attempt this with Python—which I love, by the way. But even with live LSP linting, the average Python code quality in the model's training corpora is going to affect output, and without the compile-time checks of Rust, I'd be very worried about hidden dragons.
Thanks for this post. I've been an extreme skeptic of LLMs but seeing increasingly promising results for agentic coding. I'm not sure I'm on board with using it as regular practice but increasingly seeing the need to experiment with it to better understand how it'll impact my job, and provide better informed opinions on it.
@mttaggart That was a thin line to walk. Nicely done. Very even keeled.
I’ve experienced the echos of what you wrote in my day job too. My team has decided it’s a tool, it will be used, and it will never be trusted. But writing SOC PowerShell scripts and KQL alerts isn’t the same as an ERP. I do not envy that team.
As for the place of agentics in a future world, I look to the automobile and the airplane as relevant comparisons. The modern world could not exist without either. I don’t think the next future can exist without neural nets and other agentics. The jury is still out for me on LLMs.
But the automobile and the airplane have wrecked this planet environmentally, though ther inventors knew not that would happen. We know better now. Has our species matured enough to not make the same mistake? Perhaps. But not everyone has. And they turned on the hard sell big time trying to get filthy rich from so called “AI” hoping no one would question it.
Fortunately (?) it’s starting to look like the economics are not sustainable. As evidence I posit that Nadella may not have a job for much longer.
Perhaps we’ll collectively get a moment to pause, reassess and reset. Less hype and a more considered approach to this tech is gravely needed. That worked after the Internet bubble of the 1990s burst. It’ll work here if we can get the more even keeled among us to take charge. May that happen soon.
Wow, thanks for taking the time to write out your experience so completely. I think I’d have a similar complex reaction.
Recently developed a large multi threaded python program implementing a complex PID controlled system with lots of realtime IO from sensors and encoders and to actuators along with weak test coverage. Kids interacted with it at a science museum.
1/n
Late in the project I was encouraged to try using Claude for some refactoring and investigating/fixing some bugs related to new features.
The code changes made some sense … but didn’t fit well with the existing complexity.
I had a relatively extreme cognitive allergic response which surprised me. The whole system was in my head and it felt like I was losing control. Backed out most of the code.
In hindsight this was NOT a good way to start using AI.
2/2
@mttaggart a really thoughtful and interesting post I find matching my own thoughts on a lot of this */me gestures broadly*
So thanks for posting