Do there yet exist any ready-written Creative-Commons-style licenses that explicitly withdraw permission from data harvesting for #GenerativeAI?

(boosts welcome)

Given that generative AI as it exists in the world today is incapable of citing its sources, common sense would suggest that prohibition of data harvesting for generative AI would be implied by something like https://creativecommons.org/licenses/by-sa/4.0/

However, given that Creative Commons itself seems to see generative AI as just another tool for sharing information (https://creativecommons.org/2023/02/06/better-sharing-for-generative-ai/), I'm not sure whether relying on their licenses for this could result in misunderstandings.

Creative Commons — Attribution-ShareAlike 4.0 International — CC BY-SA 4.0

@dynamic My non-lawyer understanding of the clauses would have suggested that unless they do cite who they're ingesting from, and release their training data under a similar license, they're in violation of the license terms and should probably be sued as soon as someone can definitely say their work was ingested into the training data sets without attribution.

@TheyOfHIShirts

That makes a lot of sense, but we live in a world where people are citing ChatGPT as a coauthor, and I kind of... don't trust people's common sense at all?

Like, is it sufficient "attribution" for a company like OpenAI to publish a list of every single website they harvested data from, or do they also have to ensure that the model's output appropriately attributes where the content came from?

It should be obvious that the latter is necessary, but I feel like it might not be.

@dynamic That's usually a question that the lawyers and the judges figure out when someone brings suit over the matter. Both of those are possible solutions to "attribution" and the court would examine the actual legal language and what it demands someone do to satisfy the attribution requirement. Or whether the legal language doesn't say anything at all, and therefore whatever someone has said is appropriate attribution has to be followed.

(Mind, there's still ShareAlike to be handled there.)

@TheyOfHIShirts

So in terms of communicating *intent*, do you think it would make sense to use something like CC BY-SA 4.0, but then add a note (perhaps in the "appropriate attribution" section of the document??) saying something like "the authors do not grant permission for this document to be used as input data for a generative AI"?

Is that kind of in-document information considered in the courtroom?

@dynamic I don't know. I'm not a lawyer, and that is absolutely a question that should be put before a lawyer.

Who might also answer "I don't know, nobody's tried it before" or who might have a better idea of appropriate case law and precedent for the jurisdiction of the case.

@dynamic The fallacy here is to expect they read the licenses

@gorplop

I *don't* expect them to.