WGA is bargaining to block use of their written work to train AI. (ht @harikunzru)
This is a smart move. Brief 🧵:
WGA is bargaining to block use of their written work to train AI. (ht @harikunzru)
This is a smart move. Brief 🧵:
We've already seen tons of examples of of AI-generated writing, art, and music made to sound or look similar to known human creators.
(Honestly IMO the tech is not quite there yet. Often what you get is a pale imitation. But tech is always improving!)
@tiffanycli amazing. I expected something like this to start happening but I did not expect it so soon.
Question: one way OpenAI et al try to evade this whole thing is by leaning into the "well it was made public on Teh Intertubes and we just scraped it 🤷♀️" defense. Who will be / should be liable if a script gets published on the open Web, and then sucked into an LLM training set?
Also, many everyday tools (office suites, for example) start integrating "AI". Similar question as above?
@rysiek @tiffanycli "It was on the internet" is not a defence to copyright infringement, in any way. Copyright specifically applies to things that are published or performed publicly.
More of a grey area what qualifies as copying/infringement in relation to an LLM, since it isn't something specifically contemplated by the law. But nothing prevents having multiple people involved in the copying, in different ways, *all* be liable.
@TorontoWill @tiffanycli sure. This is complicated IMVHO because:
1. the EU Copyright Directive's datamining exemption (which seems to hold for LLM training, apparently)
2. the fact that copyright law "kicks in" when something gets published, not when something gets "ingested" so to speak.
@TorontoWill @tiffanycli regarding 2.:
IANAL but as far as I understand the copyright law, it kicks in when something gets published. You either are allowed to publish something somewhere, or not.
But asking (quite reasonably in general!) for certain copyrighted work not to be used for training LLMs means asking to limit how certain material is used *post-publication*.
I.e. "you are allowed to publish this as long as no LLM will ever be trained on it". Which is… difficult to comply with?
@TorontoWill @tiffanycli yes, absolutely.
But my point is: if you have the right to publish something but you are barred from letting LLMs being trained on that something — do you actually have a right to publish?
Everything published (made public) might get sucked into an LLM training set, as we've seen with OpenAI etc. At this point anyone publishing anything already has to be aware of that…
@TorontoWill @tiffanycli we can ignore for now the legality of OpenAI sucking it into a training set for an LLM — I am specifically trying to wrap my head around the concept of someone having the right to publish something but not to let LLMs get trained on it.
And if WGA wants LLMs not to be trained on their scripts, they have to demand exactly that: not just "you shall not train LLMs on our scripts" but also "you will not allow any LLMs to be trained on our scripts."
@TorontoWill @tiffanycli otherwise that would be a loophole:
"Your honor, we did not ourselves train any LLMs on the material in question. We were allowed to publish and we did.
Third parties used the published materials to train LLMs on it, without our knowledge nor consent.
Now, WGA's terms did not stipulate we are not allowed to use such LLMs that happened to be trained on this material without our involvement…"
@TorontoWill @tiffanycli you're right of course.
Let me re-phrase: if the WGA only asks for the studios not to feed the scripts into LLMs themselves, but studiosstill get full copyright to the scripts, this will have a trivial loophole of: studios are allowed to publish the scripts online; LLMs get trained on them by third parties; studios use these LLMs to replace writers.
@TorontoWill @tiffanycli so my point is that I think WGA needs to also ask for either or both of:
a). studios will ensure no LLMs are trained by anyone on the material
b). studios will refrain from using any LLMs possibly trained on these scripts by anyone
Do you see my point?