Ich verstehe #Google nicht... Schieben die mir die Schuld zu, dass eine Seite durch robots.txt blockiert ist und die es trotzdem indexiert haben?
Wenn man in den Docs einmal nachschaut, muss man für die Warnung auch etwas scrollen.
Deren Vorschlag: pack doch noindex als meta-tag in die Seite.
Spoiler: genau dieser tag befindet sich dort bereits...
Anstatt sich einfach an Konventionen zu halten und die Seite einfach direkt zu vergessen...
Es wird einen Grund haben, warum ich einzelne Seiten nicht direkt im Internet haben möchte, sondern nur durch Verlinkungen...
The meta docs page linked says they honour robots.txt which would appear to be rubbish as the one generated by Wordpress contains a couple of lines which should include the requests they’re making I think:
Disallow: /*?add-to-cart=
Disallow: /*?*add-to-cart=
I might just grab the (long) list of source IPs that they show how to grab from Whois and block the lot with Caddy.
#Development #Explainers
Inside Googlebot · How Google’s crawl system decides which content gets indexed https://ilo.im/16btho
_____
#Business #Google #SearchEngine #SEO #Crawlers #Content #RobotsTxt #Development #WebDev #Frontend
Oh, this is #fun.
#Applebot - Apple's web crawler, used for various things - is ignoring robots.txt rules governing crawling of websites.
I have Applebot (and Applebot-Extended, which isn't really a crawler) in my robots.txt files, set to disallow all access. Has been that way for #yonks.
And Applebot is consistently the highest-traffic crawler to my sites - at least of ones that actually bother to fetch robots.txt. Yesterday, for example, Applebot fetched robots.txt from one of my websites almost 800 times.
Yes, it's really Apple, not someone faking the user-agent identifier. It's coming from the networks that Apple says can be used to identify Applebot access. DNS matches, everything.
e.g. https://support.apple.com/en-ca/119829
So: legendary Apple software quality. Documented to do the right thing, but actually doing the wrong thing. And completely failing to cache content, fetching the same file 800 times a day when it hasn't changed in years.
Hey, Apple! Need a software engineer who's actually, you know, good at it? I'm available.
#Apple #AppleInc #TimApple #WebCrawler #RobotsTxt #quality #WeveHeardOfIt #qwality #AppleQwality #legendary #TwoHardThings #caching #fail #engineer #software #SoftwareEngineer
The Dark Side of AI No One Talks About, by @jammer_volts (@mozseo.bsky.social):