Mastodawn

We should consider making a clear policy on AI contributions for the piefed repository

https://piefed.blahaj.zone/c/piefed_meta/p/656316/we-should-consider-making-a-clear-policy-on-ai-contributions-for-the-piefed-reposi

We should consider making a clear policy on AI contributions for the piefed repository

As I understand, there are currently no real guidelines for this, even though AI is currently a big topic in FOSS. In my opinion, AI can be quit…

Show thread

hendrik 2d ago

Not really digging down here, but for reference… Here’s the existing guidelines / ideals / morals outlined for the project contributions: https://codeberg.org/rimu/pyfedi/src/branch/main/docs/project_management/contributing.md

pyfedi/docs/project_management/contributing.md at main

pyfedi - Project background: https://join.piefed.social. Flagship instance: https://piefed.social

Codeberg.org

Show thread

lIlIlIlIlIlIl 2d ago

Unfortunately we can’t tell the difference between the two.

How do you plan on verifying without creating a No True Scotsman problem?

Show thread

hendrik 2d ago

I don’t think AI is as advanced yet so we’re not able to tell. I mean sure it’s problematic. And oftentimes down to circumstancial evidence.

But I regularly spot posts here which start with an affirmation, they’ll follow the rule of three, they contain em-dashes and have the tendency to be quite unnecessarily verbose for the amount of information in them. It’s not too hard to spot.

Similarly with PRs which contain a lot of bullet lists and emojis. And then Claude has some own style. You’ll spot it after you went through enough AI code. It also sometimes states the obvious. Like add a comment on how the main function is the main function. And sometimes you’ll see nonsensical code or it not following project sturcture or other things due to some limitations in the process.

So I wouldn’t say “we can’t tell the difference”. But it ain’t easy. And it’s time-consuming because AI output looks legit on a first glance. That’s the intent behind it.

Show thread

DeckPacker 2d ago

There is no way to verify it. Sometimes it is quite obvious though. It’s not really about eliminating it completely (although I wish I could). It’s more about taking a clear stance and maybe keep off a few people, that think they can “help” with AI. Maybe we could ban people, when it’s obvious. Although we should always strive to still create a welcoming culture and not create too much trouble for the people, that don’t use AI.

Show thread

OpenStars 2d ago

Autocomplete, spell-check, grammar check, style check, and the like I have no problems with, as they are mere tools enhancing the agency of the programmer.

But someone submitting code written or worse yet designed by a computer is not okay. Just leave it up to the actual devs in that case, or if you must then go ahead and create your own fork or even an entire PieFed replacement if you think you can do better by you pressing a handful of buttons than an actual programmer pressing those same buttons while also knowing what they are doing. If you are going to be a lazy fuck, then just stick with that and don’t submit PRs!?

(I’d love to hear if there is any dissent on this.)

Show thread

hendrik 2d ago

I believe we’ve discussed this several times already. How It’s rude to show AI output to people in general. I also don’t see any reason to just send in AI output, when the developers could do the same thing if they wanted. Just turn in your prompt instead 😁

It's rude to show AI output to people

..without informed consent.

Show thread

rabiezaater 2d ago

The problem with just sending the request to the developers is that often times requests for improvements are dismissed as an annoyance. I have discussed many issues with the way the fediverse works over many years, and many responses are “just build the feature yourself if you want to see it”. After years of never seeing these changes made, when you can just prompt an agent in an IDE to make the changes you want, it is hard not to take that route. Granted, if you are doing it for yourself or a small group of people who may be interested in using the changes you made, that is different from submitting AI generated code to an existing project. But I don’t see anything inherently wrong with vibe coding solutions to problems, as long as you have a general understanding of what you’re doing and how things work.

Show thread

hendrik 1d ago

Yeah, I had that as well. Some developers/maintainers are a bit elitist. Or have weird social skills. Or they’re overstrained or aren’t really motivated to include other people… And it tends to have a chilling effect and make people feel bad. I’ve also been told, just send a PR. And I know this would have been a job of 5 minutes for them to fix some easy thing to make many more users happy. And I now get to invest an hour in setting up some development environment, learn how the project is laid out… And it’s just annoying.

I think it’s a bit of a non-issue with PieFed. All I’ve seen here is constructive people. And a positive tone and atmosphere.

And I believe what you describe is a bit of a tricky situation. It’s what I’d call a “drive-by PR”. They’re not really familiar with the codebase so there’s likely issues in there. And they’re also not willing to maintain it, as it’s an one-off thing for them. So it ends up being one of two things. Either it’s a small and straightforward fix and you accept it. Or someone is customizing your software to their needs and they sent in some larger thing. Now you get a lot of stuff on your plate. See if it’s tied into the rest of the project. Check the code quality. See if they or someone ran the test cases, broke something else in the grand picture. You’re gonna be the one who will maintain that code in the future… And it takes quite some skill to stay calm, use your social skills and maintain focus on what’s important for the project.

I think it’s some delicate balance. It’s generally easier if you have some core developer base in your Free Software project and you know the people who wrote the PR. On the flipside I also often send in PRs when something is broken or I can contribute some smaller things to the various Free Software I use. I learned to make it as easy for them as possible. Read up on the coding standards before, test what I’m doing on my machine first. Add a good description of what I’m changing and why. And usually they’ll say thanks and bother with my requests.

Show thread

wjs018 2d ago

I am not going to name names or point to specific PRs so as not to shame anybody since that isn’t my intention, but we have had a couple AI-assisted PRs in the past and rimu has generally not been very receptive to them. I even really liked the functionality that one of them provided. However, they have generally been huge in that it is a ton of very verbose code that is difficult to review. I don’t believe that he has an official policy on AI-assisted contributions other than that it is easy enough to review and confirm that it is working as intended as well as not completely changing up coding style and conventions we have used elsewhere.

As a bit of disclosure, I have occasionally used very basic AI queries to help me understand something in a python library I haven’t used before or couldn’t find docs about. A specific case I remember was that I didn’t understand how to do something in the orjson library and I couldn’t find a good example in their docs or on stack overflow. Out of desperation I asked ChatGPT and it gave me a minimum viable example that I was able to adapt to what I needed. I have done similar a couple times when trying to craft regular expressions as well as dealing with some edge cases in the marshmallow python library that I couldn’t find answers for in their docs. I do make sure to test any code I write to make sure that I wasn’t just fed a hallucination or something that applied for an older version of the library but is out of date now.

Show thread

Rimu 1d ago

That’s a good summary of the different aspects of the issue, DeckPacker, thanks. Too often this kind of discussion focuses on just the code quality side of it - I guess that’s what developers are most comfortable thinking about.

The way you’ve framed it only looks at the negatives. There are benefits but we’ve heard quite enough about those.

The overarching goals of the fediverse, as I see it, are the liberation of humanity, realising the full potential of the internet and a new relationship between social media organisations and people. And so on, different things for different people but that’s the general idea. The really hard to answer question, which is less well explored in prior discussions, is how AI, as it is today (e.g. rented to us by fascists) fits in to that picture.

Let’s go through the framework you outlined.

Poor quality, insecure code - in the hands of a skilled developer with a bit of discernment, this doesn’t need to be the case. PRs can be evaluated for quality regardless. A 6000 line PR will not be acceptable whether it was LLM-generated or not (although LLMs make 6k line PRs more common!). One nice thing about coding by hand is it creates a barrier to entry so that only relatively committed people can get involved, so they’re more likely to stick around to clean up the fallout from their contributions and make future contributions. We can’t build a long-term sustainable developer community based on low-effort llm-generated one-off drive-by code drops.

So on this point my feeling is weakly negative on AI. It can be great but not every developer knows when to apply it.

Licensing issues - I am not a lawyer! But there was a court case recently which found that because AI-generated code has no author then it cannot be copyrighted and is therefore public domain. As far as I can tell this means if someone 100% vibe codes an app they don’t get to put the GPL on it because it’s not theirs to license to anyone else. Where it gets murky is if an existing GPL codebase exists and a chunk of AI-generated code is made and then altered somewhat by the developer THEN it becomes enough of their own work that it belongs to them and is licensable and can be included in the wider codebase. Exactly how much manual editing is needed, we don’t know. Whether the whole copyright legal architecture is still relevant after the AI companies got away with infringement on this scale, we don’t know.

As someone who regularly streams pirated movies and is only interested in copyright when it benefits we the people and not when it protects corporations, I’m kinda cynical about the whole thing.

Weakly negative. Of course we do want to preserve the integrity of the AGPL but perhaps that battle is lost already.

Legal trouble - If anyone goes looking for people to sue, they’ll come for Microsoft first and then work their way down. The chance of this ever effecting us seems remote. Although, if PieFed ever became a serious challenge to mainstream social media (we can dream) this could be used as a weapon against us. AFAIK this can be avoided by putting the legal responsibility on individual contributors (instead of the project as a whole) by using a Developer Certificate of Origin whereby the contributor asserts, at PR submission time, that they own the copyright. If they lie then they can be held accountable for that, theoretically.

Neutral.

Ethics - this is the hard one. The AI companies are in bed with the authoritarians, both in giving them donations, receiving support and investment from authoritarian governments, creating AI-based kill chains, mass surveillance, taking all our water, dumping their waste into our atmosphere, flooding our democratic processes with shit, and on and on. By paying money to those people to use their services, you’re telling the world “I am ok with this service, I am ok with the entity that provided it and the way it was provided. Keep doing what you’re doing”. Every token we use is strengthening omnicidal fascism. We cannot use the tools of fascists to beat them because every time we use those tools we make them stronger.

Even when we use ChatGPT for free (which drains their resources! Good!) we are normalizing dependence and usage in the industry. It’s like The Ring, in Lord of the Rings. Occasional usage can get you out of a tough spot but it becomes addictive and ultimately self-defeating. On the other hand total avoidance of using the ring would mean certain defeat.

Very negative. I feel like the other issues can be hand-waved away but this one I’m having a hard time getting past.

A lot of software projects don’t have a larger social goal so it’s easier for them to say “we’re not here to solve the world’s problems, we’re just making software”. I don’t think we get to use that escape hatch. But as a practical matter, StackOverflow is dead and Google searches are becoming less and less effective so total avoidance doesn’t seem realistic.

So that’s my general thinking. Where things ends up in terms of a concrete policy I don’t know yet. It’s not like we’re having any problems with how things are now so at present I’ll be aiming to codify existing practice in a way that keeps as many of us on board as possible.

We should consider making a clear policy on AI contributions for the piefed repository

pyfedi/docs/project_management/contributing.md at main

It's rude to show AI output to people

Developer Certificate of Origin - Wikipedia