Mastodawn

Jonathan Corbet

As the number of LLM-generated patches in my inbox increases, I am starting to experience the sort of maintainer stress that has long been predicted. But there's another aspect of this that has recently crossed my mind.

Just over a week ago, a new personality showed up with a whole pile of machine-generated patches claiming to fill in our memory-management documentation. A few reviewers had some sharp questions, the response to which has been ... silence. This person doesn't seem to have cared enough about that work to make an effort to get past the initial resistance.

Once upon a time, somebody who had produced many pages of MM documentation would be invested enough in that work to make at least a minimal attempt to defend it.

Kernel developers often worry that a patch submitter will not stick around to maintain the code they are trying to push upstream. Part of the gauntlet of getting kernel patches accepted can be seen as a sort of "are you serious?" test.

When somebody submits a big pile of machine-generated code, though, will they be *able* to maintain it? And will they be sufficiently invested in this code, which they didn't write and probably don't understand, to stick around and fix the inevitable problems that will arise? I rather fear not, and that does not bode well for the long-term maintainability of our software.

Jani Nikula 13h ago

@corbet It will be interesting to see how the use of the Assisted-by tag develops. I'm sure not everyone uses it. And if it changes the reception of a patch to be negative, surely people will be less forthcoming about LLM usage too. And, of course, a contribution based on a lie is not a great way to build trust either.

I also see trivial patches with Assisted-by that make me think, why? Couldn't you have done this yourself and learned something in the process.

Danny Boling ☮️11h ago

I learned something recently about Wikipedia that's along these lines. Their admins tend to not approve trivial edits/corrections from LLM-generated submissions for most pages //specifically// so that newbies can learn how to make edits and add content the "old-fashioned human way."

Jonathan Corbet 9h ago

@jani People are clearly not using the Assisted-by tag; I've seen a lot of examples of that in recent days. In many cases people seem to be unaware of the rules. The human inclination to not read our documentation continues, but it appears that the LLMs don't bother to read it either.

Peter H. Fröhlich 2h ago

@corbet And that's one reason for "idiots" like myself going back to kernels from before ChatGPT happened. @jani

Western Infidels 13h ago

@corbet It could be one more facet of our looming neo-feudal moment. Bespoke, tested code for the elites, slop-upon-slop code for the peasants.

I suppose some people will surely turn to LLMs to help navigate the social "are you serious" gauntlet.

Puppethead 11h ago

@WesternInfidels @corbet Mail servers use of "greylisting" (initially denying suspect connections and then let the retry through) helps avoid this "drive-by" kind of traffic. Maybe need something similar for PRs?

https://en.wikipedia.org/wiki/Greylisting_(email)

Greylisting (email) - Wikipedia

Jonathan Corbet 9h ago

@WesternInfidels I've seen stories of maintainers who have found themselves talking to a contributor who is just relaying questions to the LLM and feeding the answers back. Haven't been there myself, yet, so far as i know...

Colin Watson 50m ago

@corbet @WesternInfidels I definitely have (not in the kernel, but still). It's such an uncanny-valley feeling when you thought you were talking to a human but then realize you weren't really.

Yann Droneaud 12h ago

@corbet yes, it raises new questions, like, what happen when they will ran out of tokens.

@corbet it bodes well for proponents of outright banning LLM contributions to the kernel. just ban that shit, it's not that hard.

some people will disregard the policy. upon being caught out, ban them from all kernel spaces. simple as

@elle @corbet ^^ gosh, please do. that is already way overdue and would make you and us way more comfy

Eric Carroll 12h ago

@corbet
PR submitter blacklists are coming, if not here already.

Cassandrich 11h ago

@corbet Can we stop using the propaganda language "machine-generated code" for this? It's copyright-laundered code of unknown origin.

Howard Chu @ Symas 11h ago

@dalias @corbet exactly! https://mastodon.social/@hyc/116274100279140311

Pēteris Krišjānis 2h ago

@hyc @dalias @corbet basically this. This is not 💯 legal and no amount of mental gymnastics will convince me.

@corbet when someone new turns up with lots of new patches, it makes u wonder. Next you might see people pushing and demanding the same changes. Gives me an XZ utils vibe.

Speed demon 🇪🇺 🇳🇴🇺🇦🇵🇸10h ago

@corbet Like @pluralistic said, we are filling our walls with asbestos.

@corbet "Unwilling to defend" is a good test for the motivation of the submitter. And, unfortunately, motivation is a good test of the cost of accepting the submission.

If the motivation is self-aggrandisement (aka boosting your brand) then you know you are inheriting instant technical debt.

Jon Gerdes 9h ago

You can simply ignore stuff that looks a bit shifty for a while and see what happens.

I don't think you have any formal contractual obligations or SLAs to anyone at all when it comes to Linux. You might have some sort of community obligation (whatever that might mean).

You are our documentor-in-chief and other unlikely sobriquets and there has to be a point where you can say: "Please bugger off and go and boil your head" although I have noted you tend to a more conciliatory style.

The frontline troops at LWN have always managed to get the tone just right in the harshest of discussions and diffuse them suitably.

@corbet In my eyes, this is just a continuation of an older trend of submitting patches with no maintainability consideration.

I’ve seen it many times: a discussion of a feature is underway, someone creates a patch with no tests and barely implemented happy path, someone else asks a few month later why it’s not merged…

It’s automated now.

mahadevank 7h ago

@corbet sounds like you should use an LLM to generate an initial response to any PR, and then gauge if there's investment on the other side.

Another option - simply refuse to review large PRs.

Marco Molteni 3h ago

Thank you. This puts well into words one of my main concerns about LLM-generated or assisted code or documentation.

degenerating degenerate 3h ago

@corbet The kernel is kind of unique though in that it can offer life-changing opportunities to contributors.

The vast bulk of other FOSS projects can't. The idea that contributors will stick around or take responsibility for maintaining their work is meaningless.

For these less mighty projects, contributors are entirely drive-by and it only happens during the short window the project is on their radar.

The project maintainer has to take on responsibility for whatever was offered, AI or not.

Pēteris Krišjānis 3h ago

@corbet I remember in my FOSS past failure to communicate was show stopper. As you say, you need someone responsible.
I personally treat everyone not doing their own code not responsible. I don't care what you do to get started, but own it. Simple as that.