Leaked Documents Show Nvidia Scraping ‘A Human Lifetime’ of Videos Per Day to Train AI

https://sopuli.xyz/post/15590084

Leaked Documents Show Nvidia Scraping ‘A Human Lifetime’ of Videos Per Day to Train AI - Sopuli

https://archive.is/2024.08.05-162750/https://www.404media.co/nvidia-ai-scraping-foundational-model-cosmos-project/ [https://archive.is/2024.08.05-162750/https://www.404media.co/nvidia-ai-scraping-foundational-model-cosmos-project/]

There’s only a handful of video datasets and all of it is owned by Google through YouTube or big Hollywood companies like Disney and Netflix.

These companies are foaming at the mouth with rage thinking about what generative AI will do to their industry and how much it will help the currently non existant indie one. They will do whatever it takes to fence in the playbox and make sure they get to be the toll man.

This was never about if AI gets to live or not, but who gets to own it. 404media is essentially a mouthpiece for these corporations, willingly or not, the strengthening of copyright laws will not help the consumers or the small time creators. The only exception being laws that force copy left licenses onto models but that’s not what is being pushed right now.

Nvidia does not have a strong history of open sourcing things, to say the least. That last bit sounds like pure hopium

Their nematron 320b model was released on what essentially is an open source licence (available for commercial use except if you are doing shady things like spamming and collecting biometric data).

Having a robust open source ecosystem directly benefits Nvidia since they sell more higher end consumer GPUs.

Obviously, there’s a real chance that this isn’t open sourced since it’s a video money and there’s huge money involved. Doesnt really change the fact that having YouTube and Netflix dictate who gets to make video models and at what cost is a good idea.