Mastodawn

Lemmy may be heading down the path of LLMs

Lemmy may be heading down the path of LLMs - Leminal Space

Sadly, it seems like Lemmy is going to integrate LLM code going forward: https://github.com/LemmyNet/lemmy/issues/6385 [https://github.com/LemmyNet/lemmy/issues/6385] If you comment on the issue, please try to make sure it’s a productive and thoughtful comment and not pure hate brigading. Consider upvoting the issue to show community interest. Edit: perhaps I should also mention this one here as a similar discussion: https://github.com/sashiko-dev/sashiko/issues/31 [https://github.com/sashiko-dev/sashiko/issues/31] This one concerns the Linux kernel. I hope you’ll forgive me this slight tangent, but more eyes could benefit this one too.

Show thread

Rentlar 5d ago

Code written with the help of LLM and being reviewed is different than like what was happening with Lutris where the developer decided to obfuscate their use of AI-generated code.

The approach you suggest to totally ban it, while in principle can agree and I think that’s noble, it could lead to people accusing each other of using AI code where it may or may not have happened, or others just hiding it and trying to submit anyway without the reviewers knowing, which is just counter-productive.

I’ve followed Lemmy development now for 3 years, the devs approach is slow and steady, to a fault in some people’s views. I think it’s a better use of open source resources if we encourage candor and honesty. If the repo gets spammed with AI-generated PRs, then it will probably be blanket banned, but contributors accurately documenting and reporting their usage of AI will help direct reviewers attention to ensure the code is not slop quality or full of hallucinations.

Show thread

ell1e 5d ago

In my opinion, this argument is exactly the same as saying “we can’t enforce people not stealing GPL-licensed code and copy&pasting it into our project, so we might as well allow it and ask them to disclose it.”

You can argue that AI is actually useful, which by the way seems like what they did, and that would more fairly a good policy in my opinion. I think your argument doesn’t.

Show thread

Rentlar

My argument is that a total ban on AI use is more comparable to saying “Code from any other coding project is not allowed”. It will start unproductive arguments over boilerplate, struct definitions and other commonly used code.

The broadness and vaagueness of “no AI whatsoever” or “no code from any other projects whatsoever” will be more confusing than saying, “if you do copy any code from another project, let us know where from”. Then the PR can be evaluated, rejected if it’s nonfree or just poor quality, rather than incentivizing people to pretend other people’s code is their own, risking bigger consequences for the whole project. People can be honest if they got inspiration from stackoverflow, a reference book, or another project, if they are allowed to be.

I’m not saying AI should be blanket allowed, the submitter needs to understand the code, enough to be able to revise it for errors themselves if the devs point out something. They can’t just say “I asked AI and it’s confident that the code does this and is bug free”.

Show thread

ell1e 5d ago

Then the PR can be evaluated, rejected if it’s nonfree or just poor quality

I don’t get the difficulty of rejecting “if it’s nonfree or just poor quality or known LLM code”.

I don’t think it’s a vague criterion at all. And for many projects, if you tell them it’s from a StackOverflow post, unless you can show it’s not a direct copy they will reject it as well. I don’t see the difference. Now whether you think LLMs are worth the trouble to use is a different discussion, but your argument doesn’t convince me. Many bans aren’t easy to enforce, that doesn’t mean it’s a bad idea.

There is also a responsibility and liability question here. If something turns out to be a copyright issue and the contributor skirted a known rule, the moral judgement may look different than if you knew and included it anyway. (I can’t comment on the legal outcomes since I’m not a lawyer.)

Show thread

Rentlar 5d ago

To be specific, the jump you are making is likening LLM output to non-free code, while on the surface level it makes sense, it’s much closer to making stuff based on copied code. In the US at least, there’s clear legal precedent that LLM fabrications are not copyrightable.

Blanket AI bans are enforceable, I’m not arguing against that, it’s just that I don’t think it’s worth instituting, that it’s not a good fit for this project. My argument is that a Lemmy development policy of “please mark which parts of your code are AI-generated and how you used LLMs, and we will evaluate accordingly” is better than “if you indicate anywhere that your code is AI/LLM-generated, we will automatically reject it”.

Show thread

ell1e 5d ago

My opinion is that the data disagrees with you: psu.edu/…/beyond-memorization-text-generators-may… dl.acm.org/doi/10.1145/3543507.3583199 www.theatlantic.com/technology/2026/01/…/685552/ Related high profile incident that is very telling: pcgamer.com/…/microsoft-uses-plagiarized-ai-slop-…

Beyond memorization: Text generators may plagiarize beyond 'copy and paste'

Language models, possibly including ChatGPT, paraphrase and reuse ideas from training data without citing the source, raising plagiarism concerns.

Penn State News

Show thread

Rentlar 5d ago

I don’t mean in any way to imply that your opinion isn’t sound, but simply that I don’t agree with it here in the context of whether the Lemmy devs should accept or not PRs with any reported LLM usage.