OpenAI suspends ByteDance’s account after it used GPT to train its own AI model: Isn't it hypocritical to use the copyrighted work of others without permission but deny others the same opportunity? OpenAI scraped the entire Internet but is now acting holier than thou.
https://www.theverge.com/2023/12/15/24003542/openai-suspends-bytedances-account-after-it-used-gpt-to-train-its-own-ai-model OpenAI can train on various data types such as text, images, books & videos without permission, but competitors can not access it once it's in their system. So, yes, it is a problematic version of copyright
OpenAI suspends ByteDance’s account after it used GPT to train its own AI model.

The Verge
@nixCraft scrapping the internet might be a good idea in some ways but I'm pretty sure you meant "scraped"
@kahomono yes, i fixed it. sorry about that.
@nixCraft ngl. In this case, I'm slightly in favor of OpenAI. Deciding which data to collect can be their thing, right? (even though it's so much questionable) Directly learning from it is referring to the data itself, PLUS which data it collected, which they can legit claim theirs.
@nixCraft in the near future AI models feeding off each other's hallucinations will go completely nuts
@nixCraft You have some really good takes 👍

@nixCraft "Open" "A" "I" lives up to it's "name" once again.

The name was chosen using the oldest trick in the book: take the product's greatness weakness and advertise its opposite.

@nixCraft When entering Openai, the data comes from the left side and copyleft is valid. When leaving Openai, the data comes from the right side and copyright is valid. This is the situation. 😋
@nixCraft all violations down to the base, no matter what the stinking OSI pretends
@nixCraft Do as I say, not as I do.
@nixCraft most of what openAI uses come from https://commoncrawl.org. I wonder why so few talk about that one. You can download the data (if you have the storage.....)
Common Crawl - Open Repository of Web Crawl Data

We build and maintain an open repository of web crawl data that can be accessed and analyzed by anyone.

@nixCraft this is called circular référence so at the end, AI feeds itself...
@nixCraft ByteDance can do the same right? Build their own AI from the Internet. No one is stopping that. I don’t think this is fair to openAI.
@nixCraft if you have to train ai with ai, it is pointless? a clone of a copy is just more fake than the forgery

@nixCraft

Silly you...

Trying to apply reason, thinking, and anything resembling rationality to such things...

😁😁

@nixCraft I don’t think necessarily. It’s one thing to scrape content — so much of the Internet depends on spiders and if AI is actually going to be AI, it needs to be able to “read” just like you and I do. It’s another to actually use a program. I can read anything public on the web freely, but I have to sign a license agreement to use GPT or Excel or Photoshop.
@nixCraft Corporations built on exploiting the internet for profit and power are now eating each other, huh...