People continue to be upset about the idea that public data on the internet is crawled to train AIs.

This time people are upset with the Brave browser for crawling data from sites like Wikipedia then charging for API access to that data to companies that then use it to train AI.

This is how the web has worked since the dawn of the internet. What’s new is OpenAI and ChatGPT called attention to it.

https://stackdiary.com/brave-selling-copyrighted-data-for-ai-training/

The shady world of Brave selling copyrighted data for AI training

I'm fairly certain that I was not the only person in the world who thought to himself, "Did they just yoink the entire Internet and bundle it together into a

Stack Diary
@carnage4life the big difference here is that the crawlers ended up as referers back to the traffic source. Google for example would crawl The New York Times and then when you searched, they’d surface The New York Times. It was arguably a symbiotic relationship. OpenAI crawls The New York Times and ChatGPT responds with data it learned there but doesn’t refer the user back to The New York Times. It’s now a one sided relationship.