Opinion | This Is How A.I. Ruins the Internet

Why the development of artificial intelligence might result in greater pollution of our digital public spaces.

The New York Times

@Julia You make really good points --- and we need a better metaphor than the apocryphal "tragedy of the commons". On its eugenicist origins, see:

https://blogs.scientificamerican.com/voices/the-tragedy-of-the-tragedy-of-the-commons/

The Tragedy of the Tragedy of the Commons

The man who wrote one of environmentalism’s most-cited essays was a racist, eugenicist, nativist and Islamaphobe—plus his argument was wrong

Scientific American Blog Network

@Julia And for some work on pollution of the ecosystem that rhymes with your points, see:

Shah, Chirag and Emily M. Bender. Under review. Envisioning Information Access Systems: What Makes for Good Tools and a Healthy Web?

https://bit.ly/Env_IAS

@emilymbender @Julia
@datamyna and I have a talk for the @creativecommons #CCSummit2023 which also echos these points -- and, of course, references your important work!
The slides and prerecorded video are at https://poritz.net/j/share/WGAIDDOS.
I'm particularly proud of our analysis of the failure of generative AInt to match CC's idea of #BetterSharing, and an argument that, while copyright is not the best or only tool to fight the ills of AInt, it is almost certainly violated in all current AInt
1/3
Will Generative AI DDOS The Commons?

@emilymbender @Julia @datamyna @creativecommons
Part of the copyright analysis is a new (maybe?) way to think about AInt: there are millions of parameters in modern LLMs. These are just a highly compressed form of the terabytes of training data: after you've processed many images of faces, the additional bytes needed to include one more are many fewer than the raw data content of that new image ... i.e.: it's a compression scheme! And a compressed copy is a copy, so violates copyright!
2/3
@emilymbender @Julia @datamyna @creativecommons
Which is also why they cannot be transparent on their training datasets: copyright violation!
Even if they use CC-licensed training data, they owe attributions; if they use public domain or CC0 data, in many jurisdictions there is still a requirement of acknowledgement. To be transparent is to admit they are in massive copyright violation -- the statutory damages alone will be huge! John Grisham and George RR Martin's lawyers should use this!
3/3