Do there yet exist any ready-written Creative-Commons-style licenses that explicitly withdraw permission from data harvesting for #GenerativeAI?

(boosts welcome)

Given that generative AI as it exists in the world today is incapable of citing its sources, common sense would suggest that prohibition of data harvesting for generative AI would be implied by something like https://creativecommons.org/licenses/by-sa/4.0/

However, given that Creative Commons itself seems to see generative AI as just another tool for sharing information (https://creativecommons.org/2023/02/06/better-sharing-for-generative-ai/), I'm not sure whether relying on their licenses for this could result in misunderstandings.

Creative Commons — Attribution-ShareAlike 4.0 International — CC BY-SA 4.0

@dynamic My non-lawyer understanding of the clauses would have suggested that unless they do cite who they're ingesting from, and release their training data under a similar license, they're in violation of the license terms and should probably be sued as soon as someone can definitely say their work was ingested into the training data sets without attribution.

@TheyOfHIShirts

That makes a lot of sense, but we live in a world where people are citing ChatGPT as a coauthor, and I kind of... don't trust people's common sense at all?

Like, is it sufficient "attribution" for a company like OpenAI to publish a list of every single website they harvested data from, or do they also have to ensure that the model's output appropriately attributes where the content came from?

It should be obvious that the latter is necessary, but I feel like it might not be.

@dynamic That's usually a question that the lawyers and the judges figure out when someone brings suit over the matter. Both of those are possible solutions to "attribution" and the court would examine the actual legal language and what it demands someone do to satisfy the attribution requirement. Or whether the legal language doesn't say anything at all, and therefore whatever someone has said is appropriate attribution has to be followed.

(Mind, there's still ShareAlike to be handled there.)

@TheyOfHIShirts

So in terms of communicating *intent*, do you think it would make sense to use something like CC BY-SA 4.0, but then add a note (perhaps in the "appropriate attribution" section of the document??) saying something like "the authors do not grant permission for this document to be used as input data for a generative AI"?

Is that kind of in-document information considered in the courtroom?

@dynamic I don't know. I'm not a lawyer, and that is absolutely a question that should be put before a lawyer.

Who might also answer "I don't know, nobody's tried it before" or who might have a better idea of appropriate case law and precedent for the jurisdiction of the case.

@dynamic The fallacy here is to expect they read the licenses

@gorplop

I *don't* expect them to.

If anyone knows any licensing lawyers who might be willing to provide their opinions on public domain licensing that would preclude use in generative AI, it'd be awesome if you could tag them.

@dynamic I'm not a lawyer (and I don't really know any on here), but I suspect that whatever lawyers you do talk to are going to say something about the output of an AI algorithm not counting as a derivative work of its input. Here's a somewhat random reference https://law.stackexchange.com/q/77363/4861 suggesting that various government agencies have taken the position that generative AI outputs are not derivative works of their inputs; instead, they're original creations of the algorithm, and thus not eligible for copyright protection. But that hasn't actually been tested in court, that I know of.

Anyway, the point is, if the output of an AI algorithm is not copyrightable, then there might not be any way that you can use a license to legally prevent a work from being incorporated into generative AI. (Not without a change in the laws, of course.)

Who if anyone owns copyright of algorithmically produced works?

The image below is generated in real time by a Generative Adversarial Network trained on existing works of art (try reloading the page). The process is described in their paper which also demonstr...

Law Stack Exchange
@dynamic I know that Peter Wang at Anaconda was discussing some licensing structures at #DWebCamp