If you've been blocking AI from scraping your website, there's another one to add. This time it's Google.

I've updated my post on the subject.

https://neil-clarke.com/block-the-bots-that-feed-ai-models-by-scraping-your-website/

#ThisShouldBeOptIn

Block the Bots that Feed “AI” Models by Scraping Your Website – Neil Clarke

@clarkesworld

Great post, thanks for all the reserach. Maybe add FacebookBot to block Meta’s efforts?

“FacebookBot crawls public web pages to improve language models for our speech recognition technology.”

https://developers.facebook.com/docs/sharing/bot

@clarkesworld Also I added a link from my somewhat popular #Django robots.txt post:

https://adamj.eu/tech/2020/02/10/robots-txt/

How to add a robots.txt to your Django site - Adam Johnson

robots.txt is a standard file to communicate to “robot” crawlers, such as Google’s Googlebot, which pages they should not crawl. You serve it on your site at the root URL /robots.txt, for example https://example.com/robots.txt.