FYI: Only 7.4% of Fortune 500 have an llms.txt file, study finds: ProGEO.ai research reveals just 7.4% of Fortune 500 companies have implemented llms.txt, while 92.8% use robots.txt and 53.8% use JSON-LD for AI visibility. https://ppc.land/only-7-4-of-fortune-500-have-an-llms-txt-file-study-finds/ #LLMSTXT #Fortune500 #AIVisibility #RobotsTxt #JSONLD
Only 7.4% of Fortune 500 have an llms.txt file, study finds

ProGEO.ai research reveals just 7.4% of Fortune 500 companies have implemented llms.txt, while 92.8% use robots.txt and 53.8% use JSON-LD for AI visibility.

PPC Land

Ich verstehe #Google nicht... Schieben die mir die Schuld zu, dass eine Seite durch robots.txt blockiert ist und die es trotzdem indexiert haben?

Wenn man in den Docs einmal nachschaut, muss man für die Warnung auch etwas scrollen.

Deren Vorschlag: pack doch noindex als meta-tag in die Seite.

Spoiler: genau dieser tag befindet sich dort bereits...

Anstatt sich einfach an Konventionen zu halten und die Seite einfach direkt zu vergessen...

Es wird einen Grund haben, warum ich einzelne Seiten nicht direkt im Internet haben möchte, sondern nur durch Verlinkungen...

#GoogleSearchConsole #robotstxt #FuckBigTech

Only 7.4% of Fortune 500 have an llms.txt file, study finds: ProGEO.ai research reveals just 7.4% of Fortune 500 companies have implemented llms.txt, while 92.8% use robots.txt and 53.8% use JSON-LD for AI visibility. https://ppc.land/only-7-4-of-fortune-500-have-an-llms-txt-file-study-finds/ #Fortune500 #AI #llms #robotsTxt #JSONLD
Only 7.4% of Fortune 500 have an llms.txt file, study finds

ProGEO.ai research reveals just 7.4% of Fortune 500 companies have implemented llms.txt, while 92.8% use robots.txt and 53.8% use JSON-LD for AI visibility.

PPC Land

The meta docs page linked says they honour robots.txt which would appear to be rubbish as the one generated by Wordpress contains a couple of lines which should include the requests they’re making I think:

Disallow: /*?add-to-cart=
Disallow: /*?*add-to-cart=

I might just grab the (long) list of source IPs that they show how to grab from Whois and block the lot with Caddy.

#wordpress #bots #robotstxt

#Development #Explainers
Inside Googlebot · How Google’s crawl system decides which content gets indexed https://ilo.im/16btho

_____
#Business #Google #SearchEngine #SEO #Crawlers #Content #RobotsTxt #Development #WebDev #Frontend

Inside Googlebot: demystifying crawling, fetching, and the bytes we process  |  Google Search Central Blog  |  Google for Developers

Google for Developers

Oh, this is #fun.

#Applebot - Apple's web crawler, used for various things - is ignoring robots.txt rules governing crawling of websites.

I have Applebot (and Applebot-Extended, which isn't really a crawler) in my robots.txt files, set to disallow all access. Has been that way for #yonks.

And Applebot is consistently the highest-traffic crawler to my sites - at least of ones that actually bother to fetch robots.txt. Yesterday, for example, Applebot fetched robots.txt from one of my websites almost 800 times.

Yes, it's really Apple, not someone faking the user-agent identifier. It's coming from the networks that Apple says can be used to identify Applebot access. DNS matches, everything.
e.g. https://support.apple.com/en-ca/119829

So: legendary Apple software quality. Documented to do the right thing, but actually doing the wrong thing. And completely failing to cache content, fetching the same file 800 times a day when it hasn't changed in years.

Hey, Apple! Need a software engineer who's actually, you know, good at it? I'm available.

#Apple #AppleInc #TimApple #WebCrawler #RobotsTxt #quality #WeveHeardOfIt #qwality #AppleQwality #legendary #TwoHardThings #caching #fail #engineer #software #SoftwareEngineer

About Applebot - Apple Support (CA)

Learn about Applebot, the web crawler for Apple.

Apple Support
FYI: Czech publishers get new robots.txt shield against AI scrapers: SPIR on March 19 updated its standard for Czech online publishers to opt out of AI text and data mining, adding real-time response crawlers to the scope of the robots.txt framework. https://ppc.land/czech-publishers-get-new-robots-txt-shield-against-ai-scrapers/ #AI #robotstxt #datautajení #česképublikace #ochranadat
Czech publishers get new robots.txt shield against AI scrapers

SPIR on March 19 updated its standard for Czech online publishers to opt out of AI text and data mining, adding real-time response crawlers to the scope of the robots.txt framework.

PPC Land
ICYMI: Czech publishers get new robots.txt shield against AI scrapers: SPIR on March 19 updated its standard for Czech online publishers to opt out of AI text and data mining, adding real-time response crawlers to the scope of the robots.txt framework. https://ppc.land/czech-publishers-get-new-robots-txt-shield-against-ai-scrapers/ #technologie #publikace #AI #robotstxt #czechpublishing
Czech publishers get new robots.txt shield against AI scrapers

SPIR on March 19 updated its standard for Czech online publishers to opt out of AI text and data mining, adding real-time response crawlers to the scope of the robots.txt framework.

PPC Land
Czech publishers get new robots.txt shield against AI scrapers: SPIR on March 19 updated its standard for Czech online publishers to opt out of AI text and data mining, adding real-time response crawlers to the scope of the robots.txt framework. https://ppc.land/czech-publishers-get-new-robots-txt-shield-against-ai-scrapers/ #CzechPublishing #AIScrapers #RobotsTxt #DataMining #OnlinePrivacy
Czech publishers get new robots.txt shield against AI scrapers

SPIR on March 19 updated its standard for Czech online publishers to opt out of AI text and data mining, adding real-time response crawlers to the scope of the robots.txt framework.

PPC Land

The Dark Side of AI No One Talks About, by @jammer_volts (@mozseo.bsky.social):

https://moz.com/blog/dark-side-of-ai

#ai #seo #robotstxt

The Dark Side of AI No One Talks About

Is AI helping your SEO or sabotaging it? Discover the hidden risks of LLMs and the practical strategies to protect your brand visibility.

Moz