We'll see how I feel in the morning, but for now i seem to have convinced myself to actually read that fuckin anthropic paper

I just

I'm not actually in the habit of reading academic research papers like this. Is it normal to begin these things by confidently asserting your priors as fact, unsupported by anything in the study?

I suppose I should do the same, because there's no way it's not going to inform my read on this

"AI" is not actually a technology, in the way people would commonly understand that term.

If you're feeling extremely generous, you could say that AI is a marketing term for a loose and shifting bundle of technologies that have specific useful applications.

I am not feeling so generous.

AI is a technocratic political project for the purpose of industrializing knowledge work. The details of how it works are a distant secondary concern to the effect it has, which is to enclose and capture all knowledge work and make it dependent on capital.

So, back to the paper.

"How AI Impacts Skill Formation"
https://arxiv.org/abs/2601.20245

The very first sentence of the abstract:

> AI assistance produces significant productivity gains across professional domains, particularly for novice workers.

1. The evidence for this is mixed, and the effect is small.
2. That's not even the purpose of this study. The design of the study doesn't support drawing conclusions in this area.

Of course, the authors will repeat this claim frequently. Which brings us back to MY priors, which is that this is largely a political document.

How AI Impacts Skill Formation

AI assistance produces significant productivity gains across professional domains, particularly for novice workers. Yet how this assistance affects the development of skills required to effectively supervise AI remains unclear. Novice workers who rely heavily on AI to complete unfamiliar tasks may compromise their own skill acquisition in the process. We conduct randomized experiments to study how developers gained mastery of a new asynchronous programming library with and without the assistance of AI. We find that AI use impairs conceptual understanding, code reading, and debugging abilities, without delivering significant efficiency gains on average. Participants who fully delegated coding tasks showed some productivity improvements, but at the cost of learning the library. We identify six distinct AI interaction patterns, three of which involve cognitive engagement and preserve learning outcomes even when participants receive AI assistance. Our findings suggest that AI-enhanced productivity is not a shortcut to competence and AI assistance should be carefully adopted into workflows to preserve skill formation -- particularly in safety-critical domains.

arXiv.org
And now for a short break

I have eaten. I may be _slightly_ less cranky.

Ok! The results section! For the paper "How AI Impacts Skill Formation"

> we design a coding task and evaluation around a relatively new asynchronous Python library and conduct randomized experiments to understand the impact
of AI assistance on task completion time and skill development

...

Task completion time. Right. So, unless the difference is large enough that it could change whether or not people can learn things at all in a given practice or instructional period, I don't know why we're concerned with task completion time.

Well, I mean, I have a theory. It's because "AI makes you more productive" is the central justification behind the political project, and this is largely a political document.

> We find that using AI assistance to complete
tasks that involve this new library resulted in a reduction in the evaluation score by 17% or two grade
points (Cohen’s d = 0.738, p = 0.010). Meanwhile, we did not find a statistically significant acceleration in
completion time with AI assistance.

I mean, that's an enormous effect. I'm very interested in the methods section, now.

> Through an in-depth qualitative analysis where we watch the screen recordings of every participant in our
main study, we explain the lack of AI productivity improvement through the additional time some participants
invested in interacting with the AI assistant.

...

Is this about learning, or is it about productivity!? God.

> We attribute the gains in skill development of the control group to the process of encountering and subsequently resolving errors independently

Hm. Learning with instruction is generally more effective than learning through struggle. A surface level read would suggest that the stochastic chatbot actually has a counter-instructional effect. But again, we'll see what the methods actually are.

Edit: I should say, doing things with feedback from an instructor generally has better learning outcomes than doing things in isolation. I phrased that badly.

They reference these figures a lot, so I'll make sure to include them here.

> Figure 1: Overview of results: (Left) We find a significant decrease in library-specific skills (conceptual
understanding, code reading, and debugging) among workers using AI assistance for completing tasks with a
new python library. (Right) We categorize AI usage patterns and found three high skill development patterns
where participants stay cognitively engaged when using AI assistance

> As AI development progresses, the problem of supervising more and more capable AI systems becomes more difficult if humans have weaker abilities to understand code [Bowman et al., 2022]. When complex software tasks require human-AI collaboration,
humans still need to understand the basic concepts of code development even if their software skills are
complementary to the strengths of AI [Wang et al., 2020].

Right, sure. Except, there is actually a third option. But it's one that seems inconceivable to the authors. That is to not use AI in this context. I'm not even necessarily arguing* that's better. But if this is supposed to be sincere scholarship, how is that not even under consideration?

*well, I am arguing that, in the context of AI as a political project. If you had similar programs that were developed and deployed in a way that empowers people, rather than disempowers them, this would be a very different conversation. Of course, I would also argue that very same political project is why it's inconceivable to the authors, soooo

And then we switch back to background context. We get a 11 sentences of AI = productivity. Then 3 sentences on "cognitive offloading". 4 sentences on skill retention. And 4 on "over reliance". So, fully 50% of the background section of the "AI Impacts on Skill Formation" paper is about productivity.

Chapter 3. Framework.

Finally.

Paraphrasing a little: "the learning by doing" philosphy connects completing real world tasks with learning new concepts and developing new skills. Experiental learning has also been explored to mimic solving real world problems. We focus on settings where workers must acquire new skills to complete tasks. We seek to understand both the impact of AI on productivity
and skill formation. We ask whether AI assistance presents a tradeoff between immediate productivity and longer-term skill development or if AI assistance presents a shortcut to enhance both.

Right. There it is again: productivity. Even within this framing, there are at least 3 more possibilities. That AI does not actually increase productivity; that AI has no effect at all; or that AI improves learning only. I think it's very telling that the authors don't even conceive of these options. Particularly the last one.

But I'm becoming more and more convinced that the framing of productivity as an essential factor to measure and judge by is itself the whole purpose of this paper. And, specifically, productivity as defined by production output. But maybe I'm getting ahead of myself.

And now we have actual research questions! It feels like it shouldn't take this long to get these, but w/e

1. Does AI assistance improve task completion productivity when new skills are required?
2. How does using AI assistance affect the development of these new skills?

We'll learn how the authors propose to answer these questions in the next chapter: Methods.

But first, there is a 6 year old in here demanding I play minecraft, and I'd rather do that.

To be continued... probbaly

Chapter 4. Methods.

Let's go

First, the task. It's uh. It's basically a shitty whiteboard coding interview. The assignment is to build a couple of demo projects for an async python library. One is a non-blocking ticker. The other is some I/O ("record retrieval", not clear if this is the local filesystem or what, but probably the local fs) with handling for missing files.

Both are implemented in a literal white board coding interview tool. The test group gets an AI chatbot button, and encouragement to use it. The control group doesn't.

/sigh

I just. Come on. If you were serious about this, it would be pocket change to do an actual study

Found it! n=52. wtf. I reiterate: 20 billion dollars, just for this current funding round, and they only managed to do this study with 52 people.

But anyway, let's return to the methods themselves. They start with the design of the evaluation component, so I will too. It's organized around 4 evaluative practices they say are common in CS education. That seems fine, but their explanation for why these things are relevant is weird.

1. Debugging. According to them "this skill is curcial for detecting when AI-generated code is incorrect and understanding why it fails.

Maybe their definition is more expansive than it seems here? But it's been my experience, professionally, that this is just not the case. The only even sort-of reliable mechanism for detecting and understanding the shit behavior of slop code is extensive validation suites.

2. Code Reading. "This skill enables humans to understand and verify AI-written code before deployment."

Again, not in my professional experience. It's just too voluminous and bland. And no one has time for that shit, even if they can make themselves do it. Plus, I haven't found anyone who can properly review slop code, because we can't operate without the assumptions of comprehension, intention, and good faith that simply do not hold in that case.

3. Code writing. Honestly, I don't get the impression they even understand what this means. They say "Low-level code writing, like remembering the syntax of functions, will be less important with further integration of AI coding tools
than high-level system design."

Neither of those things is a meaningful facet of actually writing code. Writing code exists entirely in-between those two things. Code completion tools basically eliminate having to think about syntax (but we will return to this). And system design happens in the realm of abstract behaviors and responsibilities.

4. Conceptual. As they put it, "Conceptual understanding is critical to assess whether AI-generated code uses appropriate design patterns that adheres to how the library should be used.

IIIIIII guess. That's not wrong, exactly? But it's such a reverse centaur world view. I don't want to be the conceptual bounds checker for the code extruder. And I don't understand why they don't understand that.

So anyway, all of this is, apparently, in service to the "original motivation of developing and retaining the skills required for supervising automation."

Which would be cool, I'd like to read that study, because it isn't this one. This study is about whether the tools used to rapidly spit out meaningless code will impact one's ability to answer questions about the code that was spat. And even then, I'm not sure the design of the study can answer that question.

I guess this brings me to the study design. I'm struggling a little to figure out how to talk about this. The short version is that I don't think they're testing any of the effects they think they're testing.

So, they start with a warmup coding round, which seems to be mostly to let people become familiar with the tool. That's important, because the tool is commercial software for conducting coding interviews in a browser. They don't say which one, that I've seen.

Then they have two separate toy projects that the subjects should complete. 1 is a non-blocking ticker, using a specific async library. 2 is some async I/O record retrieval with basic error handling, using the same async library.

And then they take a quiz about that async library.

But there's some very important details. The coding portion and quiz are both timed. The subjects were instructed to complete them as fast as possible. And the testing platform did not seem to have code completion or, presumably, any other modern development affordance.

Given all of that, I don't actually think they measured the impact of the code extruding chatbots at all. On anything. What they measured was stress. This is a stress test.

And, to return to their notion of what "code writing" consists of: the control subjects didn't have code completion, and the test subjects did. I know this, because they said so. It came up in their pilot studies. The control group kept running out of time because they struggled with syntax for try/catch, and for string formatting. They only stopped running out of time after the researchers added specific reminders for those 2 things to the project's instructions.

So. The test conditions were weirdly high stress, for no particular reason the study makes clear. Or even acknowledges. The stress was *higher* on the control group. And the control group had to use inferior tooling.

I don't see how this data can be used to support any quantitative conclusion at all.

Qualitatively, I suspect there is some value in the clusters of AI usage patterns they observed. But that's not what anyone is talking about when they talk about this study.

And then there's one more detail. I'm not sure how I should be thinking about this, but it feels very relevant. All of the study subjects were recruited through a crowd working platform. That adds a whole extra concern about the subject's standing on the platform. It means that in some sense undertaking this study was their job, and the instruction given in the project brief was not just instruction to a participant in a study, but requirements given to a worker.

I know this kind of thing is not unusual in studies like this. But it feels like a complicating factor that I can't see the edges of.

@jenniferplusplus That paper is _extremely damning_ of the use of AI for all that it bends over backwards and ties itself into knots to try to find some way of making it seem less catastrophically bad.
@hrefna it certainly doesn't make them look good. But I'm honestly not sure we can draw *any* conclusion from this study. Which I'm getting into now
@jenniferplusplus Kind of a funny statement given that the whole point of abstraction, encapsulation, high level languages, etc. is to provide a formal basis for much of a program to be designed in terms of high level concepts

@jsbarretto That's not what people mean when they say system design.

They mean which way do dependencies flow. What is the scope of responsibility for this thing. How will it communicate with other things. How does the collection of things remain in a consistent state.

For example.

@jenniferplusplus Yeah, I get that. I've been around the block. I'm saying it's bizarre that the paper seems to be implying that AI might be a route towards automating the production of lower level code when this has been the goal of pretty much every form of developer tooling since forever.
@jenniferplusplus the latter part is especially true and i don't have any sort of strategy for handling it. i have to read every single line of LLM code because the space of possible mistakes it can make is so large. with humans, even if someone really doesn't know what they are doing, there are only so many kinds of things that could conceivably screw up.
@jenniferplusplus I agree; LLM-generated code (above a certain threshold of complexity) is like compiled C code with -O2 turned on. Hard to read, very hard to understand.
Code can get “compressed” quite a lot.

@jenniferplusplus
Why is that any different from reviewing slop code written by incompetents? Most foss maintainers have to deal with some of that too...

[Agreeing with most of what you're saying, btw]

@jenniferplusplus thank you so much for doing this. I skimmed and just couldn’t bring myself to read it all, and it’s nice to see someone doing a much deeper read but coming to largely the same conclusions.
@glyph i would do this more, but the format of academic papers is so cumbersome. The time I actually have available for it is on the couch, after the kid's in bed. But reading these things on a phone is basically impossible
@jenniferplusplus @glyph I had only read the anthropic summary. I was struck by how even if all their methods and study design were great (& a good sample etc) the results seemed to very much indicate LLM use isn't as transformative as the hype with major risks of deskilling impacts. I was surprised they published it just reading their own summary. I guess they had to make lemonade from lemons??

@r343l @glyph
As I've learned, they did some preregistration for the study. That might have influenced them.

And, a whole bunch of these ai researchers really do seem to think of themselves as serious scientists doing important work. Particularly at anthropic, as that's where a lot of the true believers ended up

@jenniferplusplus all the more reason I appreciate you putting the effort in!
@jenniferplusplus @glyph The industrial state today is a progressing milestone . But it has a history of 60 years. Turing test and Joseph Weizenbaum’s “Eliza” (same Test as Turing) are passed easily on any machine. But the myth of the ancient days about AI didnot change for many people.
@jenniferplusplus @glyph I , older too, compare AI often with the moon landing of the 1960ies when AI also started professionally at the MIT, USA. The most confusing inquiry about Apollo’s success was : Now that we reached this goal that millions dreamed about what do we want there ? And what is our next stepping stone ?

@jenniferplusplus @glyph my beloved fantasy and SciFi book was and is Solaris from Stanislaw Lem (Poland, 1961)
Https://en.wikipedia.org/wiki/Solaris_%28novel%29

a mystic ocean on a distant planet that materializes human minds life traumata . Astronauts there suffer from a deceased child or partner e.g by suicide.
The facit is that humanity tries to push their frontiers as much as possible to escape earth from daily routine. And only faces himself as in mind mirror.

@jenniferplusplus

There's a whole series of recent studies from MIT, CMU, Boston Consulting Group, BBC, and Oxford Economics arguing that AI/LLM assistants do NOT improve productivity.

Walk-through here:

https://www.someweekendreading.blog/ai-update-2026/

AI Update: 2026

What’s the state of play with LLM AI’s in early 2026? Looks really bad, frankly.

@mirabilos @jenniferplusplus

Yes, just the usual dreary confirmation of what we mostly know about "LLMs don't work for fact-based professions."

The surprise, if any, is just how *hard* managers fantasize otherwise.

@jenniferplusplus
Should title read there:
Impact of not forming mental, due to trusting and outsourcing thinking to AI in this case.
@jenniferplusplus I like the fact that their own research doesn't fit their lazy claim you reference, and they spend a lot of time trying to work out how the claim can be true, even though their own evidence is against it (and more in line with the mixed evidence in the literature, as you say).
@jenniferplusplus it reminds me a bit of the famous thing with the Flat Earth Society people who spent $20k on an expensive laser gyroscope to "prove" that the Earth was not a rotating sphere... and then spent a lot of time being very confused and upset when, of course, it measured precisely what you'd expect from a rotating spherical Earth.
@aoanla @jenniferplusplus I was baffled that Anthropic published this paper, let alone promoted it on their blog. Cos even their headline results say "AI coding bots are shit, don't use them, they're no faster and they make you stupid". But yeah, they thought they were saying things about productivity.
@jenniferplusplus No it is not. That kind of thing is left to the realm of "self-publishing". Was this thing peer reviewed?
How AI Impacts Skill Formation

AI assistance produces significant productivity gains across professional domains, particularly for novice workers. Yet how this assistance affects the development of skills required to effectively supervise AI remains unclear. Novice workers who rely heavily on AI to complete unfamiliar tasks may compromise their own skill acquisition in the process. We conduct randomized experiments to study how developers gained mastery of a new asynchronous programming library with and without the assistance of AI. We find that AI use impairs conceptual understanding, code reading, and debugging abilities, without delivering significant efficiency gains on average. Participants who fully delegated coding tasks showed some productivity improvements, but at the cost of learning the library. We identify six distinct AI interaction patterns, three of which involve cognitive engagement and preserve learning outcomes even when participants receive AI assistance. Our findings suggest that AI-enhanced productivity is not a shortcut to competence and AI assistance should be carefully adopted into workflows to preserve skill formation -- particularly in safety-critical domains.

arXiv.org
@seanwbruno @jenniferplusplus
Will "is peer reviewed" change validity/or-lack of the paper?
Should it?
@mikalai @jenniferplusplus IMO, yes. However, reading the first sentence is enough for me to move on to spend my time on other things for the day.
@seanwbruno @jenniferplusplus
I must apologize for focusing on peer review, abstracting from article itself.
But, this "force-fed GenAI and slop" moment is to ask ourselves, about how we assess statements, ideas, words.
If an article is in area with only 50 persons in it from the whole globe, "review" should be, 5 upvotes, 7 downvotes, at moment x, and then you decide to, spend time to comprehend article, or to wait. When this is more explicit, then we have better chances, as civilization, imho

@mikalai @seanwbruno @jenniferplusplus the thing that is a positive signal is that it *survived* peer review, which implies that there are multiple, knowledgeable, independent scientists in the area of study of the paper that read it and came to the conclusion, "the conclusions stated by this paper are supported by the data and arguments presented in the paper".

This paper would not survive peer review.

It is a flawed system but it is not worthless.

@kevingranade @mikalai @seanwbruno @jenniferplusplus … one more anecdote: none of these 100 mistakes were caught by 3+ peer reviews each :’)
https://gptzero.me/news/neurips/
GPTZero finds 100 new hallucinations in NeurIPS 2025 accepted papers

GPTZero's analysis 4841 papers accepted by NeurIPS 2025 show there are at least 100 with confirmed hallucinations

AI Detection Resources | GPTZero
@fnwbr @mikalai @seanwbruno @jenniferplusplus wow if you overload a system it starts failing, who could have predicted that.
@kevingranade @mikalai @seanwbruno @jenniferplusplus … if you are overloaded maybe don’t put your signature under other people’s papers :)
@jenniferplusplus You have entirely more stamina than I have. I just read the first sentence of the abstract and emitted a guffaw and exclaimed, out loud for the spouse to hear, "Citation needed!".