"(W)hat we are witnessing is the wealthiest companies in history (Microsoft, Apple, Google, Meta, Amazon …) unilaterally seizing the sum total of human knowledge that exists in digital, scrapable form and walling it off inside proprietary products, many of which will take direct aim at the humans whose lifetime of labor trained the machines without giving permission or consent."

https://www.theguardian.com/commentisfree/2023/may/08/ai-machines-hallucinating-naomi-klein

AI machines aren’t ‘hallucinating’. But their makers are

Tech CEOs want us to believe that generative AI will benefit humanity. They are kidding themselves

The Guardian

“AI art generators are trained on...millions of copyrighted images, harvested without their creator’s knowledge, let alone compensation or consent. This is effectively the greatest art heist in history."

"Why should a for-profit company be permitted to feed the [work] of living artists into a program..so it can then be used to generate doppelganger versions of those very artists’ work, with the benefits flowing to everyone but the artists themselves?" #generativeAI

https://www.theguardian.com/commentisfree/2023/may/08/ai-machines-hallucinating-naomi-klein

AI machines aren’t ‘hallucinating’. But their makers are

Tech CEOs want us to believe that generative AI will benefit humanity. They are kidding themselves

The Guardian

@chavan So would you be okay with Disney training an AI on copyrighted images that Disney owns the rights to, or is it only a problem when new media companies do it?

You could not plausibly claim it was a "copyright infringement" for Disney to train an AI on images they own, yet it won't make a difference to the people who lose their jobs whether they are replaced by an open source AI made by a tiny startup or a proprietary AI owned by Disney.

@183231bcb @chavan Of course, it would be fine. Also, of course, the output from that dataset would be less valuable and interesting, which is why these AI models are being trained on much larger and diverse sets of data.
@183231bcb It's definitely legal. They own the work, so they can do what they want with it. Unless they DON'T own the work, it's a legal white area.
Morally, however, it's fairly black.
@183231bcb @chavan Presumably the author would be fine with permissive AIs taught only on public domain content as well as content that is in the open source. There are such AIs, like StarCoder, that perform well. https://twitter.com/BigCodeProject/status/1654174941976068119
BigCode on Twitter

“Introducing: 💫StarCoder StarCoder is a 15B LLM for code with 8k context and trained only on permissive data in 80+ programming languages. It can be prompted to reach 40% pass@1 on HumanEval and act as a Tech Assistant. Try it here: https://t.co/4XJ0tn4K1m Release thread🧵”

Twitter

@Wikisteff @chavan Yep, which shows the author isn't really concerned with people losing their jobs and being replaced by AI: just with whether the robots enrich traditional publishers.

For a person who loses their job to an AI, it doesn't make a difference whether they AI was trained on public domain text or whether it was trained on copyrighted text, because you've lost your job either way. But copyright expansionists only pretend to care about the person in one of those cases.