Blocking #AppleBot made my #Sharkey instance uptime go from 1-3 minutes to two days. I think it's fixed!
Thanks @ozzy for the tip!
RE: https://fedi.social/notes/alpf7y22r5q903xb
Blocking #AppleBot made my #Sharkey instance uptime go from 1-3 minutes to two days. I think it's fixed!
Oh, this is #fun.
#Applebot - Apple's web crawler, used for various things - is ignoring robots.txt rules governing crawling of websites.
I have Applebot (and Applebot-Extended, which isn't really a crawler) in my robots.txt files, set to disallow all access. Has been that way for #yonks.
And Applebot is consistently the highest-traffic crawler to my sites - at least of ones that actually bother to fetch robots.txt. Yesterday, for example, Applebot fetched robots.txt from one of my websites almost 800 times.
Yes, it's really Apple, not someone faking the user-agent identifier. It's coming from the networks that Apple says can be used to identify Applebot access. DNS matches, everything.
e.g. https://support.apple.com/en-ca/119829
So: legendary Apple software quality. Documented to do the right thing, but actually doing the wrong thing. And completely failing to cache content, fetching the same file 800 times a day when it hasn't changed in years.
Hey, Apple! Need a software engineer who's actually, you know, good at it? I'm available.
#Apple #AppleInc #TimApple #WebCrawler #RobotsTxt #quality #WeveHeardOfIt #qwality #AppleQwality #legendary #TwoHardThings #caching #fail #engineer #software #SoftwareEngineer
Behold the AI bots that Cloudflare blocked from this blog
I don’t like writing for free–social media blatantly excepted–so when I watched a panel at Web Summit in mid-November about the effect of AI-model crawlers on news-site revenue and the Pay Per Crawl initiative that Cloudflare was proposing as a solution, I had to take notes.
Then a few weeks after I got home from Lisbon, I realized I could take action: While Pay Per Crawl remains in an invitation-only beta test, Cloudflare’s AI Crawl Control is open to the public and included in that Internet infrastructure firm’s free tier. Then I learned that it’s shockingly easy to add Cloudflare’s services to a WordPress.com blog.
Crawl Control comes with a preset list of bots to block and bots to allow, grouped by type: “AI Assistant” bots that take action in response to user requests are fine; “AI Search” bots that support “AI-driven search experiences” are also okay (contrary to Cloudflare CEO Matthew Prince’s discussion of them in that Web Summit panel); “AI Crawler” bots that collect content for training AI models are not.
I took a screenshot of this part of my Cloudflare dashboard at almost the same time each afternoon this week, and these are my totals:
To put this in context, the top two search engine crawlers had exponentially higher numbers. Google’s Googlebot somehow racked up a little over 20,000 requests, more than 30 times the presumably-human traffic I see in my WordPress dashboard here for the last five days, and 23 failed requests. Microsoft’s Bingbot came in second with 3,003 allowed requests and two unsuccessful ones.
As Cloudflare’s CEO complained in that Web Summit panel, Googlebot feeds into both Google’s traditional search and the AI Overview search results that Web publishers now blame for dangerous declines in their search traffic. There’s nothing I can do about that from this side of the screen except hope that Cloudflare’s Pay Per Crawl efforts and other advocacy efforts stir some rethinking at Google.
But I can’t tell you how well Pay Per Crawl works, because almost three weeks after applying to join the private beta I’m still waiting for my invitation. I imagine I’ll be waiting much longer before an AI-crawler operator decides that my tiny contribution to the Web’s collective content is worth sending me some money.
#AI #AIBot #AICrawlControl #AICrawler #Amazon #Applebot #Bingbot #ChatGPT #Cloudflare #Huawei #OpenAI #PayPerCrawl #Petalbot
Apple updates its Applebot documentation explaining what it means to block Applebot-Extended vs Applebot https://www.seroundtable.com/apple-updates-applebot-docs-39310.html
Apple、2024年10月からアメリカで、日本では来年から利用可能になるAI機能「Apple Intelligence」のシステム要件を公開。iOS 18.1を搭載しストレージに4GBの空きがあるiPhone 15 Pro以上のiPhoneが必要。
https://applech2.com/archives/20240910-apple-intelligence-requirements-for-iphone.html
#applech2 #ChatGPT_AI #AI #Apple #Apple_Intelligence #Applebot
Apple updates its Applebot documentation with Applebot-Extended, Reverse DNS, more user agents and so much more https://www.seroundtable.com/apple-updates-applebot-documentation-37571.html hat tip @glenngabe
Appleのパーソナル人工知能システム「Apple Intelligence」のトレーニングにはWebクローラApplebotで収集した情報をプライバシーに配慮して使用し、Applebot-Extendedでオプトアウトも可能。
https://applech2.com/archives/20240612-applebot-apple-intelligence.html
#applech2 #ChatGPT_AI #Apple #Applebot #Google #News #Siri #Spotlight #検索エンジン
Applebot is the web crawler for Apple. Products like Siri & Spotlight Suggestions use Applebot.