Further to the last boosted post about the lawsuit against GitHub Copilot – https://fediscience.org/@riedl/109282064359790093
This case connects with something I’m thinking about in connection with AI image generators like DALL•E, but it shows that this issue generalizes to any case of AI trained on data scraped from the web. There’s a presumption in AI development that data of any kind that one finds on the public web is free for the taking. They treat those data as, in effect, unclaimed natural resources, the sort of thing that John Locke argued is yours once your labour improves or builds upon it to produce something new.
But this is false on its face. First, as decolonial thinkers have pointed out, no natural resources are “unclaimed”—what explorers found and declared to be terra nullius actually belonged to indigenous communities. Data on the web are no different: they don't just exist there waiting to be exploited; they belong to real people on the other side of the network. The resource-extraction mindset of AI development based on data scraped from the web is modelled after the plunder and pillage of colonization.
Second, and building on this, as the lawsuit against GitHub Copilot argues, these data are the intellectual property of their creators. Code uploaded to GitHub is rarely released into the public domain; it is often libre or open source, and where no licence is included the presumption should be that it is protected by copyright. The lawsuit alleges that coders’ intellectual property rights have been infringed by the developers who used their code to train Copilot, because the terms of the various copyright licences have not been respected.
Third, even if the lawsuit and similar legal arguments don't succeed, there’s an ethical argument about intellectual property that does. This brings us back to Locke: recall that he argues that things produced by your labour are yours by right. This argument has been used to justify intellectual property rights as well as physical property rights: the products of your labour belong to you, so long as what you transformed with your labour wasn’t itself stolen. This goes for both the labour of the body and the labour of the mind—creative and intellectual labour, such as that which goes into writing code or painting digital images. But Locke's argument is set up so that it doesn't depend on any particular legal framework of property rights, intellectual or otherwise. His account of labour and property is set in the state of nature, where there is no government or law to enforce anyone’s rights.
So the ethical point stands regardless of whether the lawsuit against Github Copilot succeeds or fails. Using code or images or whatever kind of data you can download from the web and encode for training AI, without seeking permission from the creators or respecting the terms under which they licensed their work, is theft. And, it is not just theft of intellectual and creative property: it’s theft of labour and plunder of goods that the colonialist mindset frames as unowned.
There are plenty of unanswered questions here of course but I'm interested to hear what folks think of this argument. I'm currently working on writing it up as a paper, maybe for @facct. Am I missing anything? What objections do I need to answer?
Here’s the announcement of the lawsuit against GitHub Copilot: https://githubcopilotlitigation.com/
#aiEthics #ethicsOfComputing #artificialIntelligence #AI #ethics #philosophy #facct #responsibleComputing #techEthics #computerEthics #computerScience