GitHub Copilot adds several new wrinkles to the question whether you should open source code and put it on GitHub. If your project is addressing a specialised domain, will Copilot all of a sudden commodify your specialised work as reusable autocompletes without attribution?

Free/Open Source Software is a very specific bargain; one that is looking increasingly unsustainable: you commodify your code, your skills, your expertise, and in return you either get some form of recognition or get to collaborate with people who otherwise wouldn't be able to.

Or you are using free software to accomplish some societal or community goal.

You release code. The conditions you set and the attributions you require are what gives that code purpose.

GitHub Copilot, when it works, gives its user the benefits of using OSS but breaks the fundamental bargain that led to the code being OSS in the first place. The original coder gets no recognition, there is no attribution, your conditions are ignored, your license is disregarded.

If Copilot is legal and Copilot-generated code is fair use (which absolutely isn't certain) then it becomes an automated method for legal license-laundering: you get to use GPLed code in your proprietary work.

In this world, why would anybody release any open source code on GitHub?

Add the collaboration with ICE and this feels like a tipping point in the Free/Open Source movements' use of GitHub. Those who have already switched to other platforms are looking increasingly prescient.

@baldur This argument doesn't really work, considering it's trivial to exclude projects using certain licenses. In addition to this, Github aren't simply taking whatever is hosted on their servers; it is a selected corpus.

@opfez Copilot is explicitly using a lot of GPLed code

"Once, GitHub Copilot suggested starting an empty file with something it had even seen more than a whopping 700,000 different times during training -- that was the GNU General Public License."

https://docs.github.com/en/github/copilot/research-recitation

Research recitation - GitHub Docs

A first look at rote learning in GitHub Copilot suggestions.

@baldur Ah, I was not aware of this, thank you.

@baldur If I look at a piece of code, then re-implement the same functionality myself, I am not breaking any copyright (assuming I didn't do a verbatim copy of the original). Isn't that what Copilot is doing ?

I may be doing patent infringement, but that's a different matter.

(Not that I want to defend GitHub, I am just pondering!)

@loveisanalogue

Copilot seems to occasionally output verbatim copies and regularly near-verbatim copies with contextual alterations.

Also, Copilot itself is software that's derived from existing creative works and is a tool for creating derivative works. Software licenses don't cover what's inside your mind but they certainly do cover software.

The patent issue is interesting. We have no idea what the patent situation is on the code Copilot is trained on since the license isn't passed on

@loveisanalogue @baldur I believe you have to do some attribution in that case. I know one author who didn't want to look at another project to avoid needing to license their project as GPL.

Also, there's been times like with ReacOS where they somehow had implementations of Windows code which was directly from code provided to universities. Action was taken over this.

@oiyouyeahyou @baldur Ah, yes you are right. Algorithms are not copyrightable, but re-interpreting code is considered derived work.

Looking more into now, I read that large companies wary of that problem do a thing called "clean room implementation", where one person reads the code, then writes down the algorithm as a graph (with no implementation details) and someone else takes that and implements the algorithm.

@baldur what is github copilot? Thanks in advance
GitHub Copilot · Your AI pair programmer

GitHub Copilot works alongside you directly in your editor, suggesting whole lines or entire functions for you.

GitHub Copilot