None of the so-called commitments by the seven AI companies addresses illegal scraping and theft of copyrighted content. Why? Because their business models would collapse if they had to pay people for the terabytes of content they are stealing.

@mattkressel my heart resonates with this comment, but I think we have to accept that #copyright law generally, both for training data and otherwise, is still a 20th century artifact that doesn't really know what to do with #AI, or for that matter, distinctions between machine uses and human uses of source material.

It's like the early days of trying to adjudicate steroid use in sports. We don't really know what to make of technologically-augmented capabilities that de-level the playing field.

@jamiexml Nah, that's a copout. We know exactly what these tools are doing: stealing copyrighted works. And copyright is still very much relevant and important.

@jamiexml @mattkressel so-called "AI" is just software running on a deterministic computer. It’s something between a lossy compressor/decompressor and a compiler, so it’s totally clear that their output is a derivative of their inputs, and to use it one has to honour all licences on the input.

This is easily seen in how it’s possible to write prompts that extract entire "training data" images in recognisable form, about 3/4 of it good as unchanged (including the watermark) and the last quarter filled in suitably.

@mattkressel And the Government is fine with this.
@mattkressel like all voluntary commitments from private enterprise, its just an empty, meaningless PR figleaf.

@mattkressel

The hardest part is always getting training data... and enough of it...

Børge A. Roum (@[email protected])

Where are all those people who where up in arms about students and any old Jill and Joe pirating music and movies, and how that was horrible theft that made it necessary to curtail freedom of speech, surveil everything on the internet, taking people's access to the web (= work, education, communication, etc) away, and so much more, now that the richest companies in the history of the world is scraping every copyrighted piece of data on the web to earn even more money?

Tutoteket