Oh, this is #fun.

#Applebot - Apple's web crawler, used for various things - is ignoring robots.txt rules governing crawling of websites.

I have Applebot (and Applebot-Extended, which isn't really a crawler) in my robots.txt files, set to disallow all access. Has been that way for #yonks.

And Applebot is consistently the highest-traffic crawler to my sites - at least of ones that actually bother to fetch robots.txt. Yesterday, for example, Applebot fetched robots.txt from one of my websites almost 800 times.

Yes, it's really Apple, not someone faking the user-agent identifier. It's coming from the networks that Apple says can be used to identify Applebot access. DNS matches, everything.
e.g. https://support.apple.com/en-ca/119829

So: legendary Apple software quality. Documented to do the right thing, but actually doing the wrong thing. And completely failing to cache content, fetching the same file 800 times a day when it hasn't changed in years.

Hey, Apple! Need a software engineer who's actually, you know, good at it? I'm available.

#Apple #AppleInc #TimApple #WebCrawler #RobotsTxt #quality #WeveHeardOfIt #qwality #AppleQwality #legendary #TwoHardThings #caching #fail #engineer #software #SoftwareEngineer

About Applebot - Apple Support (CA)

Learn about Applebot, the web crawler for Apple.

Apple Support

It occurs to me that it might not be deliberately ignoring the directives in robots.txt, but instead simply be that they can't write correct code to parse robots.txt. That would be extremely on-brand for Apple software. Incompetence, Occam, etc.

Like my final-gen Ipod Touch that can't start playing music when you hit "play", for example, without doing a complicated dance and waiting 30 seconds.

#jwz has documented a ton of such "omg how did you even release this software" issues on his blog over the years. Reading them always seems to invoke the "I tell you the things I do so you don't have to experience them" vibe.

#incompetence #Apple #OnBrand