Mastodawn

misk Aug 5, 2024

Leaked Documents Show Nvidia Scraping ‘A Human Lifetime’ of Videos Per Day to Train AI

Leaked Documents Show Nvidia Scraping ‘A Human Lifetime’ of Videos Per Day to Train AI - Sopuli

https://archive.is/2024.08.05-162750/https://www.404media.co/nvidia-ai-scraping-foundational-model-cosmos-project/ [https://archive.is/2024.08.05-162750/https://www.404media.co/nvidia-ai-scraping-foundational-model-cosmos-project/]

Show thread

MonkderVierte Aug 6, 2024

Properly following licensing, right?

Show thread

lemmyvore Aug 6, 2024

No, see, because it’s “learning like a human”, and everybody knows that you’re allowed to bypass any licensing for learning. /s

But seriously I don’t know how they make the jump to these conclusions either.

Show thread

areyouevenreal Aug 6, 2024

This is a massive strawman argument. No one is saying you shouldn’t have a license to view the content in order to train an AI. Most of the information used to train these models is publicly available and licensed for public viewing.

Show thread

lemmyvore Aug 6, 2024

Just because something is available for public viewing does not mean it’s licensed for anything except personal use.

The strawman here is that since physical people benefit from personal use exceptions in the law, machine learning software should too. But why should they? Since when is a piece of software ran by a corporation equivalent to an individual person?

Show thread

VoterFrog Aug 6, 2024

Copyright licensing allows the owner to control how a work is distributed, not how it’s consumed. “Personal use” just means that you can’t turn around and redistribute a work that you’ve obtained. Not that you’re not allowed to consume it in a corporate setting.

Show thread

Captain Poofter Aug 6, 2024

Consuming is not the same thing as training.

Show thread

VoterFrog Aug 6, 2024

Training literally is consuming. A copyright license doesn’t get to dictate what computer programs the work is allowed to be used with. There’s a ton a entertainment mega corps that would love for that to be the case, though. You’re saying that you’re not allowed to do a statistical analysis on a copyrighted work. It’s nonsense. It’s well-established that copyright does not prevent that kind of use.

Show thread

Captain Poofter Aug 6, 2024

What makes you think copyright law doesn’t apply to companies using copy written data? That is not the case.

Show thread

Captain Poofter

I remain unconvinced by your points.

The law should reflect that these companies need to be first granted permission to use datasets by the rights holders, and creative commons licenses need to be given an opportunity to opt out of being crawled for these datasets. Anything else is wrong. Machines are not humans. Creative common copyright law was not written with the concept of machines being “consumers”. These companies took advantage of the sudden emergence of these models and the delay of law in holding their hunger for data in check. They need to be held accountable.