We found
-the copyright symbol appears >200M times
-pirated sites, 1 for e-books
-half of the top 10 were news sites https://www.washingtonpost.com/technology/interactive/2023/ai-chatbot-learning
@nitashatiku If you’re going to do this, why don’t you retrieve the ‘robots.txt’ from each site. See how many of them (1) have one, and (2) don’t disallow bots? And (3) have _explicit sitemaps_ to content.
The bots were invited in. You may hate it now, but they were invited.
That’s because _this is how it works_. Folks wanted SEO, so offered up their content up to be found.
I get the frustration, I really do, but it’s super-clear to me: we invited the bots in to read our content and…they did.
@Quisley @nitashatiku I understand that that’s how you feel, but I don’t think that’s how it’ll play out in a court. And… In what way are search engines not commercial purposes?
They run ads next to your site links in search results. They have a money-printing press, for goodness sake. 🤣
Opting in to indexing is definitely opting in to a commercial use. That LLMs are not the commercial use you had in mind…well that’ll be a fascinating argument to watch, but I wouldn’t put money on either side.