#Business #Reports
Anthropic details how Claude crawls sites · How to block the three separate user agents https://ilo.im/16ax7y

_____
#AI #Claude #Crawlers #UserAgents #RobotsTxt #Content #Website #WebDev #Frontend #Backend

Anthropic clarifies how Claude bots crawl sites and how to block them

Anthropic explains how its bots handle AI training, live queries, and search results, and what opting out means for visibility.

Search Engine Land
🎉🎂 Wikipedia turns 25! 🎂🎉 And to celebrate, they're kindly reminding us to set user-agents and follow robot policies! 🤖 Because nothing says "Happy Birthday" like a slap on the wrist from your friendly neighborhood encyclopedia! 📜🥳
https://wikipedia25.org #Wikipedia25 #UserAgents #RobotPolicies #HappyBirthday #TechNews #Celebration #HackerNews #ngated
25 years of Wikipedia

Of the bigger browsers, I think Safari is probably the closest to that right now, for all its flaws. It may be that Waterfox or some other Firefox fork with the AI garbage ripped out of it is better, I haven't delved into that.

https://www.waterfox.com

#Browsers #userAgents

Waterfox - Open source web browser

The web browser that respects your privacy

Waterfox

It would be nice if somebody made a user agent for the web.

You know, software that actually works on behalf, and in the interests, of the user, rather than the maker.

Chrome has been adware for years now. Edge was actually pretty good while it was a fairly vanilla Chromium fork, but it seems MS is intent on stuffing Copilot into it too.

https://www.w3.org/WAI/UA/work/wiki/Definition_of_User_Agent

#browsers #userAgents

Definition of User Agent - WAI UA Wiki

🐧 Ah, the elite club of “please-be-our-friend-agents” has a new bouncer: Finnix! 🤖 Apparently, the secret handshake involves setting a useragent and worshipping at the altar of the robot policy. 🎩 Because, let's face it, who doesn't want to endlessly browse #httpsw.wiki4wJS and decipher phabricator hieroglyphics for fun? 🙄
https://en.wikipedia.org/wiki/Finnix #eliteclub #useragents #Finnix #robotpolicy #HackerNews #ngated
Finnix - Wikipedia

🐱🚀 In an epic saga of internet's least thrilling adventure, we journey through the mystical realm of user agents and robot policies, because why address the 'cat gap' when there's a thrilling T400119 quest awaiting? 🤖✨ Spoiler: it's less about cats and more about tech folks pretending to have a sense of humor. 🙄
https://en.wikipedia.org/wiki/Cat_gap #internetadventures #useragents #techhumor #robotpolicies #catgap #HackerNews #ngated
Cat gap - Wikipedia

@koteisaev @craignewmark not necessesarily.

The problems re: @delta / #dletaChat and/or @thunderbird may be caused by #eMail providers either actively blocking #PGP/MIME and/or inline-#PGP, having extremely tight quotas and/or filtering #UserAgents / #Clients.

  • At least from my experience...
🤡 Ah yes, because what the internet really needed was another thrilling tale about respecting robot policies and setting user agents. 🎩🤖 Clearly, the pinnacle of human achievement: writing a riveting wiki page that nobody asked for! 🥱📄
https://en.wikipedia.org/wiki/Leatherman_(vagabond) #robotpolicies #useragents #internetdrama #wikiwriting #techhumor #HackerNews #ngated
Leatherman (vagabond) - Wikipedia

#Development #Findings
AI bots and robots.txt · How websites use robots.txt to set AI crawling rules https://ilo.im/166la4

_____
#AI #Bots #Content #Website #UserAgents #RobotsTxt #Business #SEO #WebDev #Backend

AI Bots and Robots.txt

There’s been a lot of discussion lately around AI crawlers and bots, which are used to train LLMs and/or fetch content on behalf of their users. In the past few weeks I’ve seen blog posts about the amount of traffic from these crawlers, techniques and products to control how and what they can crawl, reports of misbehaving crawlers and more. Ironically, there’s even AI based services to mitigate AI crawler bots! Given how much interest there is, I thought I’d try and explore some HTTP Archive data to see how sites are using robots.txt to state their preferences on AI crawling.

Paul Calvano
😂 Oh, look! Another riveting tale of tech etiquette—because everyone was just DYING to know about robot policies and user agents. 🙄 Spoiler: The staff devoured it later, proving once again that even the dullest topics can become a snack! 🍴
https://en.wikipedia.org/wiki/The_staff_ate_it_later #techetiquette #robotpolicies #useragents #humor #snacktime #HackerNews #ngated
The staff ate it later - Wikipedia