The threat is comfortable drift toward not understanding what you're doing

https://ergosphere.blog/posts/the-machines-are-fine/

The machines are fine. I'm worried about us.

On AI agents, grunt work, and the part of science that isn't replaceable.

The thing is, agents aren’t going away. So if Bob can do things with agents, he can do things.

I mourn the loss of working on intellectually stimulating programming problems, but that’s a part of my job that’s fading. I need to decide if the remaining work - understanding requirements, managing teams, what have you - is still enjoyable enough to continue.

To be honest, I’m looking at leaving software because the job has turned into a different sort of thing than what I signed up for.

So I think this article is partly right, Bob is not learning those skills which we used to require. But I think the market is going to stop valuing those skills, so it’s not really a _problem_, except for Bob’s own intellectual loss.

I don’t like it, but I’m trying to face up to it.

> So if Bob can do things with agents, he can do things.

The problem arrises when Bob encounters a problem too complex or unique for agents to solve.

To me, it seems a bit like the difference between learning how to cook versus buying microwave dinners. Sure, a good microwave dinner can taste really good, and it will be a lot better than what a beginning cook will make. But imagine aspiring cooks just buying premade meals because "those aren't going anywhere". Over the span of years, eventually a real cook will be able to make way better meals than anything you can buy at a grocery store.

The market will always value the exact things LLMs can not do, because if an LLM can do something, there is no reason to hire a person for that.

Precisely. The first 10 rungs of the ladder will be removed, but we still expect you to be able to get to the roof. The AI won't get you there and you won't have the knowledge you'd normally gain on those first 10 rungs to help you move past #10.
That’s a good analogy but I think we’ve already went from 0 to 10 rungs over the last couple of years. If we assume that the models or harnesses will improve more and more rungs will be removed. Vast majority of programmers aren’t doing novel, groundbreaking work.
But AI might actually get you there in terms of superior pedagogy. Personal Q&A where most individuals wouldn't have afforded it before.

People would have said the same about graphing calculators or calculators before that. Socrates said the same thing about the written word.

The determining factor is always "did I come up with this tool". Somehow, subsequent generations always manage to find their own competencies (which, to be fair, may be different).

This isn't guaranteed to play out, but it should be the default expectation until we actually see greatly diminishing outputs at the frontier of science, engineering, etc.

I think that's too easy an analogy, though.

Calculators are deterministically correct given the right input. It does not require expert judgement on whether an answer they gave is reasonable or not.

As someone who uses LLMs all day for coding, and who regularly bumps against the boundaries of what they're capable of, that's very much not the case. The only reason I can use them effectively is because I know what good software looks like and when to drop down to more explicit instructions.

Determinism just means you don't have to use statistics to approach the right answer. It's not some silver bullet that magically makes things understandable and it's not true that if it's missing from a system you can't possibly understand it.

That's not what I mean.

If I use a calculator to find a logarithm, and I know what a logarithm is, then the answer the calculator gives me is perfectly useful and 100% substitutable for what I would have found if I'd calculated the logarithm myself.

If I use Claude to "build a login page", it will definitely build me a login page. But there's a very real chance that what it generated contains a security issue. If I'm an experienced engineer I can take a quick look and validate whether it does or whether it doesn't, but if I'm not, I've introduced real risk to my application.

Those two tasks are just very different. In one world you have provided a complete specification, such as 1 + 1, for which the calculator responds with some answer and both you and the machine have a decidable procedure for judging answers. In another world you have engaged in a declaration for which the are many right and wrong answers, and thus even the boundaries of error are in question.

It's equivalent to asking your friend to pick you up, and they arrive in a big vs small car. Maybe you needed a big car because you were going to move furniture, or maybe you don't care, oops either way.

Yes. That is the point I was making.

Calculators provide a deterministic solution to a well-defined task. LLMs don't.

> Calculators are deterministically correct

Calculators are deterministic, but they are not necessarily correct. Consider 32-bit integer arithmetic:

30000000 * 1000 / 1000
30000000 / 1000 * 1000

Mathematically, they are identical. Computationally, the results are deterministic. On the other hand, the computer will produce different results. There are many other cases where the expected result is different from what a computer calculates.

A good calculator will however do this correctly (as in: the way anyone would expect). Small cheap calculators revert to confusing syntax, but if you pay $30 for a decent handheld calculator or use something decent like wolframalpha on your phone/laptop/desktop you won't run into precision issues for reasonable numbers.
He’s not talking about order of operations, he’s talking about floating point error, which will accumulate in different ways in each case, because floating point is an imperfect representation of real numbers
Good languages with proper number towers will deal with both cases in equal terms.

If you hand a broken calculator to someone who knows how to do math, and they entered 123 + 765 which produced an answer of 6789; they should instantly know something is wrong. Hand that calculator to someone who never understood what the tool actually did but just accepted whatever answer appeared; and they would likely think the answer was totally reasonable.

Catching an LLM hallucinating often takes a basic understanding of what the answer should look like before asking the question.

The calculator analogy is wrong for the same reason. Knowing and internalizing arithmetic, algebra, and the shape of curves, etc. are mathematical rungs to get to higher mathematics and becoming a mathematician or physicist. You can't plug-and-chug your way there with a calculator and no understanding.

The people who make the calculator analogy are already victims of the missing rung problem and they aren't even able to comprehend what they're lacking. That's where the future of LLM overuse will take us.

What do people mean exactly when they bring up “Socrates saying things about writing”? Phaedrus?

> “Most ingenious Theuth, one man has the ability to beget arts, but the ability to judge of their usefulness or harmfulness to their users belongs to another; [275a] and now you, who are the father of letters, have been led by your affection to ascribe to them a power the opposite of that which they really possess.

> "For this invention will produce forgetfulness in the minds of those who learn to use it, because they will not practice their memory. Their trust in writing, produced by external characters which are no part of themselves, will discourage the use of their own memory within them. You have invented an elixir not of memory, but of reminding; and you offer your pupils the appearance of wisdom, not true wisdom, for they will read many things without instruction and will therefore seem [275b] to know many things, when they are for the most part ignorant and hard to get along with, since they are not wise, but only appear wise."

Sounds to me like he was spot on.

But did this grind humanity to a halt?

Yes - specific faculties atrophied - I wouldn't dispute it. But the (most) relevant faculties for human flourishing change as a function of our tools and institutions.

> People would have said the same about graphing calculators or calculators before that.

As it happens, we generally don't let people use calculators while learning arithmetic. We make children spend years using pencil and paper to do what a calculator could in seconds.

There are a lot of people in academia who are great at thinking about complex algorithms but can't write maintainable code if their life depended on it. There are ways to acquire those skills that don't go the junior developer route. Same with debugging and profiling skills

But we might see a lot more specialization as a result

They can’t write maintainable code because they don’t have real world experience of getting your hands dirty in a company. The only way to get startup experience is to build a startup or work for one
What. Are you saying maintainable code is specifically related to startups? I can accept companies as an answer (although there are other places to cut your teeth), but startups is a weird carveout.
Writing maintainable code is learned by writing large codebases. Working in an existing codebase doesn't teach you it, so most people working at large companies do not build the skill since they don't build many large new projects. Some do but most don't. But at startups you basically have to build a big new codebase.

Duh, the only way to get startup experience is indeed to get startup experience.

My point is that getting into the weeds of writing CRUD software is not the only way to gain the ability to write complex algorithms, or to debug complex issues, or do performance optimization. It's only common because the stuff you make on the journey used to be economically valuable

> write complex algorithms, or to debug complex issues, or do performance optimization

That’s the stuff that ai is eating. The stuff I’m talking about (scaling orgs, maintaining a project long term, deciding what features to build or not build etc) is stuff very hard for ai

Do they need to write maintainable code? I think probably not, it's the research and discovering the new method that is important.
To me it feels more like learning to cook versus learning how to repair ovens and run a farm. Software engineering isn’t about writing code any more than it’s about writing machine code or designing CPUs. It’s about bringing great software into existence.
Or farming before and after agricultural machines. The principles are the same but the ”tactical” stuff are different.

How many people who cook professionally are gourmet chefs? I think it ends up that gourmet cooking is so infrequently needed that we don’t require everyone who makes food to do it, just a small group of professionally trained people. Most people who make food for a living work somewhere like McDonald’s and Applebee’s where a high level of skill is not required.

There will still be programming specialists in the future — we still have assembly experts and COBOL experts, after all. We just won’t need very many of them and the vast majority of software engineers will use higher-level tools.

That's the problem though: programmers who become the equivalent of McDonald's workers will be paid poorly like McDonald's workers and be treated as disposable like McDonald's workers.

The correct distinction is: if you can't do something without the agent, then you can't do it.

The problem that the author describes is real. I have run into it hundreds of times now. I will know how to do something, I tell AI to do it, the AI does not actually know how to do it at a fundamental level and will create fake tests to prove that it is done, and you check the work and it is wrong.

You can describe to the AI to do X at a very high-level but if you don't know how to check the outcome then the AI isn't going to be useful.

The story about the cook is 100% right. McDonald's doesn't have "chefs", they have factory workers who assemble food. The argument with AI is that working in McDonald's means you are able to cook food as well as the best chef.

The issue with hiring is that companies won't be able to distinguish between AI-driven humans and people with knowledge until it is too late.

If you have knowledge and are using AI tools correctly (i.e. not trying to zero-shot work) then it is a huge multiplier. That the industry is moving towards agent-driven workflows indicates that the AI business is about selling fake expertise to the incompetent.

> The problem arrises when Bob encounters a problem too complex or unique for agents to solve.

It’s actually worse than that: the AI will not stop and say ”too complex, try in a month with the next SOTA model”. Rather, it will give Bob a plausible looking solution that Bob cannot identify as right or wrong. If Bob is working on an instant feedback problem, it’s ok: he can flag it, try again, ask for help. But if the error can’t be detected immediately, it can come back with a vengeance in a year. Perhaps Bob has already gotten promoted by then, and Bobs replacement gets to deal with it. In either case, Bob cannot be trusted any more than the LLM itself.

They aren't going away but for some they may become prohibitively expensive after all the subsidies end.

I do think coding with local agents will keep improving to a good level but if deep thinking cloud tokens become too expensive you'll reach the limits of what your local, limited agent can do much more quickly (i.e. be even less able to do more complex work as other replies mention).

> They aren't going away but for some they may become prohibitively expensive after all the subsidies end.

Even if inference was subsidized (afaik it isn't when paying through API calls, subscription plans indeed might have losses for heavy users, but that's how any subscription model typically work, it can still be profitable overall).

Models are still improving/getting cheaper, so that seems unlikely.

It probably is still subsidized, just not as much. We won't know if these APIs are profitable unless these companies go public, and till then it's safe to bet these APIs are underpriced to win the market share.
Then we’ll likely know by the end of this year.
Anthropic has shared that API inference has a ~60% margin. OpenAI's margin might be slightly lower since they price aggressively but I would be surprised if it was much different.
Is that margin enough to cover the NRE of model development? Every pro-AI argument hinges on the models continuing to improve at a near-linear rate

Yeah but the argument people make is that when the music stops cost of inference goes through the roof.

I could imagine that when the music stops, advancement of new frontier models slows or stops, but that doesn't remove any curent capabilities.

(And to be fair the way we duplicate efforts on building new frontier models looks indeed wasteful. Tho maybe we reach a point later where progress is no longer started from scratch)

Third-party AI inference with open models is widely available and cheap. You're paying as much as proprietary mini-models or even less for something far more capable, and that without any subsidies (other than the underlying capex and expense for training the model itself).

> The thing is, agents aren’t going away. So if Bob can do things with agents, he can do things.

"Being able to deliver using AI" wasn't the point of the article. If it was the point, your comment would make sense.

The point of the program referred to in the article is not to deliver results, but to deliver an Alice. Delivering a Bob is a failure of the program.

Whether you think that a Bob+AI delivers the same results is not relevant to the point of the article, because the goal is not to deliver the results, it's to deliver an Alice.

I am aware of that - I was adding something along the lines of: I don’t think people care if we deliver Alices any more.

> I am aware of that - I was adding something along the lines of: I don’t think people care if we deliver Alices any more.

That's irrelevant to the goal of the program - they care. Once they stop caring, they'd shut that program down.

Maybe it would be replaced with a new program that has the goal of delivering Bobs+AI, but what would be the point? I mean, the article explained in depth that there is no market for the results currently, so what would be the point of efficiently generating those results?

The market currently does not want the results, so replacing the current program with something that produces Bobs+AI would be for... what, exactly?

There’s no market for the results, but there was a market for Alices, because they were the only people who could produce similar results historically. Now maybe there’s less of a market for Alices. Yes, maybe that means the program disappears.
People never cared about delivering Alices; they were an implementation detail. I think the article argues that they're still an important one, but one that isn't produced automatically anymore
The article is talking about science research in the context of astrophysics, not coding sweatshops.
I was also talking about producing researchers for academia.

I'm glad you've posted this comment because I strongly feel more people need to see sentiment, and push back against what many above want to become the new norm. I see capitulation and compliance in advance, and it makes me sad. I also see two very valid, antipodal responses to this phenomenon: Exit from the industry, and malicious compliance through accelerationism.

To the reader and the casual passerby, I ask: Do you have to work at this pace, in this manner? I understand completely that mandates and pressure from above may instill a primal fear to comply, but would you be willing to summon enough courage to talk to maybe one other person you think would be sympathetic to these feelings? If you have ever cared about quality outcomes, if for no other reason than the sake of personal fulfillment, would it not be worth it to firmly but politely refuse purely metrics-focused mandates?

> So if Bob can do things with agents, he can do things.

Yes, but how does he know if it worked? If you have instant feedback, you can use LLMs and correct when things blow up. In fact, you can often try all options and see which works, which makes it ”easy” in terms of knowledge work. If you have delayed feedback, costly iterations, or multiple variables changing underneath you at all times, understanding is the only way.

That’s why building features and fixing bugs is easy, and system level technical decision making is hard. One has instant feedback, the other can take years. You could make the ”soon” argument, but even with better models, they’re still subject to training data, which is minimal for year+ delayed feedback and multivariate problems.

I've just started a new role as a senior SWE after 5 months off. I've been using Claude a bit in my time off; it works really well. But now that I've started using it professionally, I keep running into a specific problem: I have nothing to hold onto in my own mind.

How this plays out:

I use Claude to write some moderately complex code and raise a PR. Someone asks me to change something. I look at the review and think, yeah, that makes sense, I missed that and Claude missed that. The code works, but it's not quite right. I'll make some changes.

Except I can't.

For me, it turns out having decisions made for you and fed to you is not the same as making the decisions and moving the code from your brain to your hands yourself. Certainly every decision made was fine: I reviewed Claude's output, got it to ask questions, answered them, and it got everything right. I reviewed its code before I raised the PR. Everything looked fine within the bounds of my knowledge, and this review was simply something I didn't know about.

But I didn't make any of those decisions. And when I have to come back to the code to make updates - perhaps tomorrow - I have nothing to grab onto in my mind. Nothing is in my own mental cache. I know what decisions were made, but I merely checked them, I didn't decide them. I know where the code was written, but I merely verified it, I didn't write it.

And so I suffer an immediate and extreme slow-down, basically re-doing all of Claude's work in my mind to reach a point where I can make manual changes correctly.

But wait, I could just use Claude for this! But for now I don't, because I've seen this before. Just a few moments ago. Using Claude has just made it significantly slower when I need to use my own knowledge and skills.

I'm still figuring out whether this problem is transient (because this is a brand new system that I don't have years of experience with), or whether it will actually be a hard blocker to me using Claude long-term. Assuming I want to be at my new workplace for many years and be successful, it will cost me a lot in time and knowledge to NOT build the castle in the sky myself.

Then you're using it more towards vibe coding than AI-assisted coding: I use AI to write the stuff the way I want it to be written. I give it information about how to structure files, coding style and the logic flow.

Then I spend time to read each file change and give feedback on things I'd do differently. Vastly saves me time and it's very close or even better than what I would have written.

If the result is something you can't explain than slow down and follow the steps it takes as they are taken.

I agree that being further along the Vibe end of the spectrum is the issue. Some of the other ways I use Claude don't have the same problems.

> If the result is something you can't explain than slow down and follow the steps it takes as they are taken.

The problem is I can explain it. But it's rote and not malleable. I didn't do the work to prove it to myself. Its primary form is on the page, not in my head, as it were.

I'm on the same path as you are it seems. I used to be able to explain every single variable name in a PR. I took a lot of pride in the structure of the code and the tests I wrote had strategy and tactics.

I still wrote bugs. I'd bet that my bugs/LoC has remained static if not decreased with AI usage.

What I do see is more bugs, because the LoC denominator has increased.

What I align myself towards is that becoming senior was never about knowing the entire standard library, it was about knowing when to use the standard library. I spent a decade building Taste by butting my head into walls. This new AI thing just requires more Taste. When to point Claude towards a bug report and tell it to auto-merge a PR and when to walk through code-gen function by function.

> I can explain it. But it's rote and not malleable.

The AI can help with that too. Ask it "How would one think about this issue, to prove that what was done here is correct?" and it will come up with somethimg to help you ground that understanding intuitively.

AI assisted coding makes you dumber full stop. It's obvious as soon as you try it for the first time. Need a regex? No need to engage your brain. AI will do that for you. Is what it produced correct? Well who knows? I didn't actually think about it. As current gen seniors brains atrophy over the next few years the scarier thing is that juniors won't even be learning the fundamentals because it is too easy to let AI handle it.

I agree. In the beginning when I was starting, I let the AI do all of the work and merely verified that it does what I want, but then I started running into token limits. In the first two weeks I honestly was just looking forward for the limit to refresh. The low effort made it feel like I would be wasting my time writing code without the agent.

Starting with week three the overall structure of the code base is done, but the actual implementation is lacking. Whenever I run out of tokens I just started programming by hand again. As you keep doing this, the code base becomes ever more familiar to you until you're at a point where you tear down the AI scaffolding in the places where it is lacking and keep it where it makes no difference.

It's a spectrum and we don't have clear notches on the ruler letting us know when we're confidently steering the model and when we've wandered into vibe coding. For me, this position is easy to take when I am feeling well and am not feeling pressured to produce in a fixed (and likely short) time frame.

It also doesn't help that Claude ends every recommendation with "Would you like me to go ahead and do that for you?" Eventually people get tired and it's all to easy to just nod and say "yes".

That is indeed a very annoying part of many AI models. I wish I could turn it off.
For me it seems more or less similar to reviewing others' changes to a codebase. In any large organization codebase, most of the changes won't be our own.