Mastodawn

Great article by the ABC's Jack Ryan talking about #generativeAI and its #copyright consequences - especially how the existing #law has failed to keep pace with #EmergingTech

My take is that we need to start thinking about #DataTrusts - where #data is kept in trust and released only for the purposes approved by data owners - and which could facilitate payment for that data ...

https://www.abc.net.au/news/science/2023-11-29/artificial-intelligence-ai-training-datasets-copyright-books3/103157980

ChatGPT and other AI models were trained on copyrighted books. Can they be 'untrained'?

Training AI models on copyrighted materials has led to an array of high-profile lawsuits against developers such as OpenAI and Meta. Can the models be "untrained" or is the genie out of the bottle?

ABC News

Show thread

Robert Link Nov 28, 2023

@KathyReid You mean, "IP rights" only matter when the right people are getting rich off of them, so "AI" has stolen everything it can get it's cyber-hands on.

Show thread

Cath Nov 28, 2023

@KathyReid Side note, not on topic: If we could apply the “data trust” principle to PII as well, a few of the most offensive industries would disappear pretty damn quick.

I like the idea.

Show thread

Kathy Reid Nov 29, 2023

@drowsygeek yes! And it would allow us to build trust with companies over time as they prove themselves worthy of trust, or remove privileges if they prove themselves unworthy.

Show thread

geraldew Nov 28, 2023

@KathyReid Surely "data" is not synonymous with created content? The article is clearly about the latter.
That said, I'd be the first to admit I've not followed how copyright laws have been applying to data. I'm probably assuming little has changed since there was a court case over the Australian White Pages some decades ago.
IIRC that established that the collections of data could be copyrighted but not the data within them.

Show thread

Kathy Reid Nov 29, 2023

@geraldew this is a really good point - the distinction between created content, data and what counts as "tokens" for large language models and for other generative AI.

At what granularity can works be copyrighted?

Thank you - more to ponder!