#Business #Reports
Anthropic details how Claude crawls sites · How to block the three separate user agents https://ilo.im/16ax7y
_____
#AI #Claude #Crawlers #UserAgents #RobotsTxt #Content #Website #WebDev #Frontend #Backend
#Business #Reports
Anthropic details how Claude crawls sites · How to block the three separate user agents https://ilo.im/16ax7y
_____
#AI #Claude #Crawlers #UserAgents #RobotsTxt #Content #Website #WebDev #Frontend #Backend
Of the bigger browsers, I think Safari is probably the closest to that right now, for all its flaws. It may be that Waterfox or some other Firefox fork with the AI garbage ripped out of it is better, I haven't delved into that.
It would be nice if somebody made a user agent for the web.
You know, software that actually works on behalf, and in the interests, of the user, rather than the maker.
Chrome has been adware for years now. Edge was actually pretty good while it was a fairly vanilla Chromium fork, but it seems MS is intent on stuffing Copilot into it too.
https://www.w3.org/WAI/UA/work/wiki/Definition_of_User_Agent
@koteisaev @craignewmark not necessesarily.
The problems re: @delta / #dletaChat and/or @thunderbird may be caused by #eMail providers either actively blocking #PGP/MIME and/or inline-#PGP, having extremely tight quotas and/or filtering #UserAgents / #Clients.
#Development #Findings
AI bots and robots.txt · How websites use robots.txt to set AI crawling rules https://ilo.im/166la4
_____
#AI #Bots #Content #Website #UserAgents #RobotsTxt #Business #SEO #WebDev #Backend
There’s been a lot of discussion lately around AI crawlers and bots, which are used to train LLMs and/or fetch content on behalf of their users. In the past few weeks I’ve seen blog posts about the amount of traffic from these crawlers, techniques and products to control how and what they can crawl, reports of misbehaving crawlers and more. Ironically, there’s even AI based services to mitigate AI crawler bots! Given how much interest there is, I thought I’d try and explore some HTTP Archive data to see how sites are using robots.txt to state their preferences on AI crawling.