If you've been blocking AI from scraping your website, there's another one to add. This time it's Google.
I've updated my post on the subject.
https://neil-clarke.com/block-the-bots-that-feed-ai-models-by-scraping-your-website/
If you've been blocking AI from scraping your website, there's another one to add. This time it's Google.
I've updated my post on the subject.
https://neil-clarke.com/block-the-bots-that-feed-ai-models-by-scraping-your-website/
Thank you for this. I'm not entirely sure if it's necessary for me if I use GitHub Pages, but I added the file nonetheless.
@eklem @robertoqs @clarkesworld
For GitHub Pages, add robots.txt to a repo called <username>.github.io and then it will appear at <username>.github.io/robots.txt
For example:
https://github.com/hugovk/hugovk.github.io/commit/79a14a01d37d574e2a76127722cdaf25cc1b9293
https://github.com/hugovk/hugovk.github.io
https://hugovk.github.io/robots.txt
More info:
https://stackoverflow.com/a/47652485/724176
https://docs.github.com/en/pages/getting-started-with-github-pages/about-github-pages
#GitHub #GitHubPages #robotstxt
I think it has to be in the root directory.
Well, if you were already using GitHub Pages, you already had a repository named [username].github.io, if I'm not mistaken. The point about robots.txt is to add it to the root directory, otherwise known as the main branch in GitHub.
@robertoqs @eklem @clarkesworld
I didn't have a [username].github.io repo until I created it this morning. But I did have other repos using GitHub Pages, and they are served like [username].github.io/other-repo
But until I created [username].github.io with robots.txt, there was nothing at [username].github.io/robots.txt for the others. As you say, it must be at the root.
Some links to docs: https://mastodon.social/@hugovk/111146631566244808
Ah, I see. Then [username].github.io is only required when not using a custom domain, like in my website's case. That's what I was thinking about.
So what I did was simply to put robots.txt next to my HTML files. Also, thanks for the documentation. I love Stack Overflow, by the way.