Mastodawn

Dare Obasanjo Jul 15, 2023

People continue to be upset about the idea that public data on the internet is crawled to train AIs.

This time people are upset with the Brave browser for crawling data from sites like Wikipedia then charging for API access to that data to companies that then use it to train AI.

This is how the web has worked since the dawn of the internet. What’s new is OpenAI and ChatGPT called attention to it.

https://stackdiary.com/brave-selling-copyrighted-data-for-ai-training/

The shady world of Brave selling copyrighted data for AI training

I'm fairly certain that I was not the only person in the world who thought to himself, "Did they just yoink the entire Internet and bundle it together into a

Stack Diary

Show thread

Josh Collinsworth Jul 15, 2023

@carnage4life This...is a bad take, IMO. True, people have long provided content for web platforms, but they've always done it in exchange for something, like SEO benefits, or growing their following on that platform.

"This is how the Internet works" would be a very dismissive explanation (and a very convenient one for digital colonists) even if it were true. It's not, though. Nobody ever worked land for free so a corporation could build a fence around it and charge admission.

Show thread

Stephen

@collinsworth @carnage4life I have to agree with Josh - there was an exchange of value. Training for AI is one way.