People continue to be upset about the idea that public data on the internet is crawled to train AIs.

This time people are upset with the Brave browser for crawling data from sites like Wikipedia then charging for API access to that data to companies that then use it to train AI.

This is how the web has worked since the dawn of the internet. What’s new is OpenAI and ChatGPT called attention to it.

https://stackdiary.com/brave-selling-copyrighted-data-for-ai-training/

The shady world of Brave selling copyrighted data for AI training

I'm fairly certain that I was not the only person in the world who thought to himself, "Did they just yoink the entire Internet and bundle it together into a

Stack Diary
@carnage4life the web is about linking to other documents. What these LLM models do is actually the opposite: They provide no links to the sources so there is no web anymore.