@asjo I've been seeing the same pattern for months: #OpenAI's crawlers are slurping anything they can lay their clammy hands on, no matter what /robots.txt? is saying.

So now I regularly grab the IP addresses from the JSON blobs mentions on https://platform.openai.com/docs/bots/ and add them to my #iptables.

/cc #ChatGPT, #GPTBot, #OAI, #SearchBot

#Development #Explorations
The overlap between search bots and AI scrapers · Why robots.txt alone won’t keep AI off your website https://ilo.im/15z9al

_____
#Business #SEO #AI #SearchEngine #SearchBot #AiBot #Website #WebDev #UserAgent #RobotsTxt

The overlap between search bots and AI scrapers

This is a bit of a technical blog post, but I’ve tried to provide explanations of technical jargon where possible. It starts off with some summaries of what people have discovered over the past few…

The Lazarus Corporation

@blaine @andrew The elegant thing about @anildash's proposal is that your content would only be searchable if you were following this hypothetical #searchbot at the moment you publicly posted it, effectively opting in on a per-post basis.

Likewise, any flavor of boosted content (including quotes, if available) would presumably only be indexed if both accounts involved were opted in (via this mechanism) at the relevant posting time(s).

Imagine being able to easily search through thousands of customer records, articles, Helpdesk tickets, and all the things from within #Slack. Learn how to build a #Searchbot with @Elastic #AppSearch and @Slack using @Zapier. No coding required! https://go.es.io/2EU3jEI
Building a Searchbot using Slack, Zapier, and Elastic App Search

Build your very own Searchbot using Slack, Zapier, and Elastic App Search. No coding required! Search through thousands of documents right from Slack.

Robots.txt | The Associated Worlds