Mastodawn

Ludovic Courtès Mar 13

It’s already tempting notably for smallish projects to resort to genAI:
https://toot.aquilenet.fr/@civodul/116132543248503962

But I think a race to the bottom has started in #FreeSoftware, with this rationale: if “we” don’t use genAI in our project, then we will lose to the competition, whether free slopware or proprietary.

Ludovic Courtès (@[email protected])

I think these two factors—lack of humanpower and a “big” vision—coupled with the passion for technicalities typical of such projects make them particularly vulnerable to genAI. Because yes, “we” want SMP support in Mach and it’s not been happening until this contributor achieved something with the help of genAI.

Aquilepouet

Ludovic Courtès Mar 13

… which is short-sighted and loses track of the whole user empowerment goal that free software is supposedly about.

But the “economic” incentives are here.

Carlos O'Donell

@civodul I'm working on a glibc (and jointly a gcc) LLM policy which I'll propose for public review, and the difficulty is in threading the needle between technology that we could use ourselves, and user freedoms. My position ends up being that I want to define a policy that allow the projects the to outright reject *or* accept such changes as they see fit, within certain constraints that support user freedom e.g. either you understand the code or it is reproducible with a tool.

Siddhesh Poyarekar Mar 13

@codonell @civodul that speaks to the validity of the code, and maybe empowers/includes more people (maybe the opposite too, if LLM use discourages those ideologically opposed?) in the development community for the project. What does it do for the bigger software freedom picture though? What does it do for copyleft code bases? Is the move towards all FOSS code becoming public domain (given that US courts are leaning towards LLM generated code not being copyrightable) a net positive one?

Siddhesh Poyarekar Mar 13

@codonell @civodul I mean I know there are caveats and it's not necessary *today* that all LLM generated code is non-copyrightable (e.g. if a developer uses it for scaffolding and then injects their own creativity in there, making the code copyrightable) but it's something to think about when creating an LLM policy that doesn't just reject or quarantine legally significant contributions.

Olivier Mengué Mar 13

@siddhesh_p @codonell @civodul I expect that very soon AI tools will be available to rebuild sources from binaries: I don't see a particular reason why machine code would be harder to process than source code.
So, how will that change the value of proprietary software distributed as binaries?
Which hidden secrets will be revealed from closed firmwares?
I see a coming revolution in taking back control of hardware. Much earlier than AGI or quantum computing.

tusharhero Mar 13

@siddhesh_p @codonell @civodul Just because the code is public domain doesn't mean the companies won't still find ways to keep it proprietary. It will be asymmetrical, they will take our code because it is in public, and just refuse to share their "public domain" code (public domain doesn't force you to share code.).

So no, this won't be a net positive. There will be new legal mechanisms to shackle the users.

Siddhesh Poyarekar Mar 13

@tusharhero @codonell @civodul yes, that is the point I'm making.

Carlos O'Donell Mar 13

@siddhesh_p @civodul You can only control your own actions, and I would continue to contribute creatively to copyleft projects, and I would encourage others to do the same. Even if someone else, who I don't control, uses an LLM to create a clone, they could always have done that with a fork. They will still not have my time or my attention.

Siddhesh Poyarekar Mar 13

@codonell @civodul yes but someone having an LLM fork the project is not a concern when it comes to drafting LLM policies for projects. That's a separate dumpster fire.

Siddhesh Poyarekar Mar 13

@codonell @civodul as maintainer btw, you control not only your actions, but also the actions of your project community and an LLM policy is exactly that :)

Carlos O'Donell Mar 13

@siddhesh_p @civodul For clarity, I don't control anyone's actions except my own (and even then my body doesn't always comply). As a GNU Project maintainer I am responsible for a package, and I'll work to support that package in the best interest and the ideals of the project. People can fork. People can developer alternative projects. People can contribute to bionic. I see a path that, while it might not line up exactly ethically with what I believe, is maximally freedom respecting.

Carlos O'Donell Mar 13

@siddhesh_p @civodul The existence of public domain contributions in our projects does not directly weaken our copyleft positions. For example glibc considers all locale data to be public domain, and the FSF claims no copyright on that data. Yet we're still an LGPL2+ project. We generate boiler plate all the time that is not novel or expressive, and it doesn't undermine our ideals. There are extremes here that carry risk, and I think a good policy should express those risks.

Siddhesh Poyarekar Mar 13

@codonell @civodul that's only because today, public domain contributions are quarantined to specific, strategic areas (like locales). LLM contributions will change that.

Carlos O'Donell Mar 13

@siddhesh_p @civodul Any contributor to the GNU Project can go through a disclaimer process putting their works in the public domain and contribute them to glibc. It is one of the currently valid processes. It's not the ideal case, and does not support my copyleft ideals, but I respect the wishes of the contributor, and they are furthering the project goals. We should not operate under the slipper slop fallacy that we are heading towards 100% public domain.

Carlos O'Donell Mar 13

@siddhesh_p @civodul LOL "slipper slop" ... I'll leave my typo there because it makes me laugh 😃

Siddhesh Poyarekar Mar 13

@codonell @civodul what I'm arguing is that it's not just a theoretical slippery slope, it's real this time. There's also the question of what the project goals are after all, are they simply to achieve technical goals and solve difficult computer science problems?

Carlos O'Donell Mar 13

@siddhesh_p @civodul I'd say the GNU Project. the GNU Toolchain, and glibc have broader free software goals that include collaboration with all FOSS projects, and supporting user freedoms. How is the slippery slope not theoretical? How does a single step of possibly accepting public domain code (however it is generated) in a Makefile trigger the eventual removal of my freedoms?

Siddhesh Poyarekar Mar 13

@codonell @civodul the pre-LLM possibility of copyleft code being replaced by public domain code relies on there being a set of motivated individuals who are prolific in their contributions to the project and at the same time, want their contributions to be under the public domain.

In contrast, with LLM usage, simply allowing LLM contributions has a tangible risk of any and all contributions that come in, to be in the public domain. The realm of possibility expands quite greatly.

Siddhesh Poyarekar Mar 13

@codonell @civodul of course with your example of "makefile patch" I assume you're thinking of the possibility of LLM use in a restricted area of sources, which is a different thing from someone coming along with optimized implementations of string functions for all architectures.

Carlos O'Donell Mar 13

@siddhesh_p @civodul While you write "tangible risk" this still follows a slippery slope fallacy. What is the risk exactly? We've always allowed public domain, and we still do today. We will always keep mixing in LGPLv2+ code in glibc, since we are adding festures, fixing bugs, and refactoring as developers supporting a copyleft project. This resulting work remains LGPLv2+. The act of accepting these works does not in and of itself cause risks to the 4 freedoms except indirectly.

Siddhesh Poyarekar Mar 13

@codonell @civodul increased adoption of LLMs will drive up contributions that are public domain? Do you think it's not something that will happen?

Ludovic Courtès Mar 14

@siddhesh_p @codonell We don’t know yet if LLM output will be considered public domain, and in which jurisdictions.

If it turns out to be the case, will it be a win? Eventually all software would be public-domain?

My guess is that much software would be private. With fewer people mastering software development, the power in the hands of LLM-operating companies would be huge.

But this is pure speculation.

Carlos O'Donell Mar 14

@civodul @siddhesh_p LLM output is already being considered public domain in the U.S. and while other jurisdictions matter, the FSF is based there and for copyright assignment purposes U.S. law is relevant.

I have a case today with localedata where a contributor claims copyright and a license in the Netherlands for unique and novel expression, but the FSF in the U.S. does not, so the project files have a disclaimer.

There are LLM cases winding through the courts today... I'm curious 🤔

Siddhesh Poyarekar Mar 14

@codonell @civodul this is essentially why I'd like projects (at least the ones I'm personally involved in) to take a conservative position (disallow or quarantine LLM contributions) until there's a clearer picture and not try to "get in the game" for fear of missing out.

Carlos O'Donell Mar 14

@siddhesh_p @civodul My position is that the projects should default to rejecting LLM contributions unless they can meet a set of restrictions that reduce risk. For example I don't think we can accept an LLM contribution that implements a standards conforming feature. The likelihood we get a look-a-like from llvm or msvc is very high and that risk is too high. I want to see unique and novel implementations of standard features.

Carlos O'Donell Mar 14

@siddhesh_p @civodul If we can use the llvm version, then we do so by copying the sources, giving attribution, and maintaining a relationship with the project where we sync sources e.g. sanitizers, libffi, gnulib, etc.

Carlos O'Donell Mar 14

@siddhesh_p @civodul To that end we would automatically reject LLM contributions to anything glibc's SHARED-FILES list (which is quite a lot), including CORE-MATH contributions where I expect an LLM would be unable to reason correctly.

Ludovic Courtès Mar 14

@codonell @siddhesh_p I’m aware of a report suggesting that LLM output be considered public domain in the US:
https://copyright.gov/ai/Copyright-and-Artificial-Intelligence-Part-2-Copyrightability-Report.pdf

But it’s not the same as this being a settled matter, AIUI.

Also, there’s for instance this class action against Anthropic that could challenge this:
https://www.anthropiccopyrightsettlement.com

Ludovic Courtès Mar 14

@siddhesh_p @codonell Side note: when Copilot was released a couple of years ago, everyone in free software understood that it was trained on tons of copyleft source code and was thus infringing on “our” copyright.

How in so little time did we get to swallow that LLM output could be considered public domain, after all?

Carlos O'Donell Mar 14

@civodul @siddhesh_p You are mixing two distinct issues. Firstly there is a question of infringing the licenses during training, which today is being argued fair use, but I don't expect this is settled. Second is infringing output when regurgitation happens, and when it doesn't happen there is the legal and ethical question of copyrightability of the ouptut. The questions asked as going to take time to answer. Todays answer can still meaningfully be that LLM outputs for now are public domain.

Carlos O'Donell Mar 14

@civodul @siddhesh_p I agree it is not a settled matter. What are the consequences upon our actions? My goal is to write the best possible policy today with the given knowledge, risks, and community goals in mind. If things change then I'll change the policy.

Ludovic Courtès Mar 14

@codonell @siddhesh_p The practical consequence is that accepting “legally significant” code in a project is risky.

Gnulib only accepts up to 5 lines of LLM output, citing the risk of LLMs regurgitating copyrighted material:
https://lists.gnu.org/archive/html/bug-gnulib/2026-02/msg00064.html

an LLM policy

Carlos O'Donell Mar 14

@civodul @siddhesh_p Accepting only 5 lines is the equivalent of accepting nothing. I'd be willing to accept any number of Makefile lines generated by LLM because they are boilerplate for glibc, gcc, binutils and gdb. Likewise an LLM writing a glibc test that uses the "support/" framework to verify ISO C fprintf() compliance is very unique to glibc. However, implementing fprintf() runs a high risk of infringing on training data, and I'd reject LLM submissions for new standard features.

Carlos O'Donell Mar 14

@civodul @siddhesh_p Risk tolerances are per individual, per project, and subjective. We should be empathetic towards each other as we each feel these risks subjectively differently. One might keep me up at night, and you might sleep well.

I strongly agree with your opinion that the habits and behvaiours we are encouraging here run the risk of isolating community members from eachother. Policy won't solve that.

Carlos O'Donell Mar 14

@civodul @siddhesh_p To put a positive spin on things...

What will help:
* Physical meetups (GNU Tools Cauldron)
* Weekly virtual f2f project meetings (glibc patch queue review)
* Monthly virtual f2f project meetings (2x Office Hours for the GNU Toolchain across two timezones)

What needs to be done:
* Better volunteer onboarding discussing the values of the community.

Ludovic Courtès Mar 13

@siddhesh_p @codonell I think many discussions miss the social aspects of free software: knowledge sharing, mutual aid, building a community around a shared goal. Software for the people, by the people.

And also: Why bother talking to these glibc folks if I can pay 10k–20k to get the machine to produce a C library just for me?

Carlos O'Donell Mar 13

@civodul @siddhesh_p I agree there are "isolating" social issues. I am concerned about a new developer who finds it lower personal cost to ask the LLM to write something than to reach out to our community to learn, grow, and expand the FOSS ecosystem. Likewise writing new code with an LLM instead of growing the FOSS ecosystem. I have my doubts that a company can justify having a private C library because of the cost of compliance e.g. security, regulation (EU CRA, FIPS 140-2, SSDLCs) etc.

Carlos O'Donell Mar 13

@civodul @siddhesh_p My position is that policy won't solve these problems. These problems are foundational. Either you value collective action or you don't. Education is paramount. We retread age old problems.

Kevin Granade Mar 13

@codonell @civodul can you elaborate on what the intersection between user freedoms and a project-scoped LLM policy is? Such a policy would seem to me to govern what changes the project accepts and their provenance. I'm not clear where that inpinges on user freedoms.

Carlos O'Donell Mar 13

@kevingranade @civodul Two issues. There is a continuum between something a person can understand, and for which the 4 freedoms makes sense, and something you can't understand. Consider https://www.sollya.org/, and the inputs used to automatically generate libm functions, and sufficiently edited LLM code no human has read or understands. My position is that user freedom requires we contribute something that can be understood, particularly without requiring proprietary tools or undue cost. 1/2

Sollya software tool

Carlos O'Donell Mar 13

@kevingranade @civodul Second. There are network and social effects. This is where I think Ludovic is correct. We are being isolated in ways that mean we are less likely to exercise our freedoms. Why read, edit, and remix copyleft code to create new derivative works if the LLM creates the code. Why reach out to other copyleft authors to learn and grow, a high friction high cost activity, when we can ask the LLM? Policy can address code sharing and collaboration. 2/2.

Kevin Granade Mar 13

@codonell @civodul oh I'm actually coming at it from the other side, in what way do they intersect in such a way that LLM use is remotely on the table? As far as I can see, in practice a LLM tool anywhere in the process for generating a change destroys its provenance and renders it ineligible for inclusion. Where's the other side of that?

Carlos O'Donell Mar 13

@kevingranade @civodul Is your position rooted in legal or ethical foundations? Do we consider the contributors freedoms e.g. free for any purpose? What does it mean to contribute to the project vs. the community? I think the answer is different depending on the position you have to these questions. The GNU Project has a clear philosophical position on the 4 freedoms, and that doesn't include the ethics of the contributor. As individuals we can reject contributions based on our own ethics.

Kevin Granade Mar 13

@codonell @civodul Is this a policy for use by a project, a meta-policy for building a policy, or "guidance" rather than a policy and you're leaving it up to individual project members to make the call on their own?

This is concerning; most of these alternatives resolve to "yes please use LLMs" policy in practice, because a large number of participants in these projects are beholden to companies that are all-in on AI and unless each project presents a united front they WILL jam in LLM outputs.

Carlos O'Donell Mar 14

@kevingranade @civodul I'm working on an LLM policy for glibc to use.