WGA is bargaining to block use of their written work to train AI. (ht @harikunzru)
This is a smart move. Brief 🧵:
WGA is bargaining to block use of their written work to train AI. (ht @harikunzru)
This is a smart move. Brief 🧵:
We've already seen tons of examples of of AI-generated writing, art, and music made to sound or look similar to known human creators.
(Honestly IMO the tech is not quite there yet. Often what you get is a pale imitation. But tech is always improving!)
Of course, this doesn’t address how much writing is publicly available through other means or writing that has already been used to train AI. And it doesn’t tell you how WGA members are meant to enforce their rights if they find a company has misused their works. (And how would they even find out in the first place?)
But still, a good start and definitely something to watch.
@tiffanycli akin to Jet Li saying no to the Matrix to prevent them stealing his techniques with mocap
@tiffanycli amazing. I expected something like this to start happening but I did not expect it so soon.
Question: one way OpenAI et al try to evade this whole thing is by leaning into the "well it was made public on Teh Intertubes and we just scraped it 🤷♀️" defense. Who will be / should be liable if a script gets published on the open Web, and then sucked into an LLM training set?
Also, many everyday tools (office suites, for example) start integrating "AI". Similar question as above?
@rysiek @tiffanycli "It was on the internet" is not a defence to copyright infringement, in any way. Copyright specifically applies to things that are published or performed publicly.
More of a grey area what qualifies as copying/infringement in relation to an LLM, since it isn't something specifically contemplated by the law. But nothing prevents having multiple people involved in the copying, in different ways, *all* be liable.
@TorontoWill @tiffanycli sure. This is complicated IMVHO because:
1. the EU Copyright Directive's datamining exemption (which seems to hold for LLM training, apparently)
2. the fact that copyright law "kicks in" when something gets published, not when something gets "ingested" so to speak.
@TorontoWill @tiffanycli regarding 2.:
IANAL but as far as I understand the copyright law, it kicks in when something gets published. You either are allowed to publish something somewhere, or not.
But asking (quite reasonably in general!) for certain copyrighted work not to be used for training LLMs means asking to limit how certain material is used *post-publication*.
I.e. "you are allowed to publish this as long as no LLM will ever be trained on it". Which is… difficult to comply with?
@TorontoWill @tiffanycli yes, absolutely.
But my point is: if you have the right to publish something but you are barred from letting LLMs being trained on that something — do you actually have a right to publish?
Everything published (made public) might get sucked into an LLM training set, as we've seen with OpenAI etc. At this point anyone publishing anything already has to be aware of that…
@TorontoWill @tiffanycli we can ignore for now the legality of OpenAI sucking it into a training set for an LLM — I am specifically trying to wrap my head around the concept of someone having the right to publish something but not to let LLMs get trained on it.
And if WGA wants LLMs not to be trained on their scripts, they have to demand exactly that: not just "you shall not train LLMs on our scripts" but also "you will not allow any LLMs to be trained on our scripts."
@TorontoWill @tiffanycli otherwise that would be a loophole:
"Your honor, we did not ourselves train any LLMs on the material in question. We were allowed to publish and we did.
Third parties used the published materials to train LLMs on it, without our knowledge nor consent.
Now, WGA's terms did not stipulate we are not allowed to use such LLMs that happened to be trained on this material without our involvement…"
@TorontoWill @tiffanycli you're right of course.
Let me re-phrase: if the WGA only asks for the studios not to feed the scripts into LLMs themselves, but studiosstill get full copyright to the scripts, this will have a trivial loophole of: studios are allowed to publish the scripts online; LLMs get trained on them by third parties; studios use these LLMs to replace writers.
@TorontoWill @tiffanycli so my point is that I think WGA needs to also ask for either or both of:
a). studios will ensure no LLMs are trained by anyone on the material
b). studios will refrain from using any LLMs possibly trained on these scripts by anyone
Do you see my point?
OpenAI are trying to evade those things, but it doesn't mean that they have. We are seeing pushback on many fronts.
Samsung has already banned employees from using the tools after a leak of code.
And there's an ongoing class action suit against the image generators.
@rysiek @tiffanycli honestly, I think we'll start seeing the first big fights when people start publishing movies featuring Disney characters "tweaked" by AI.
It's going to take money on multiple sides of this issue to actually resolve it and it's going to take a few years for that to happen.
Tech workers should do the same.
Negotiate a hard stop to the AI using GIT, stack overflow, and other code repositories to train AI to take their jobs...
It's called GitHub Copilot. ( https://github.com/features/copilot )
Here are four others: https://www.forbes.com/sites/janakirammsv/2022/03/14/5-ai-tools-that-can-generate-code-to-help-programmers/?sh=1417aa555ee0