Who do you think you are?

47.128.32.0 - - [18/Mar/2026:00:48:01 +0100] "GET /robots.txt HTTP/1.1" 403 239 "-" "-" 1650 4269

#Amazon #AWS Singapore.

Good on you that #CrowdSec won't immediately block on a missing user-agent, but my httpd-ACL does.

#DarkVisitors #AI #Crawler #GenAI #SocialPermissionToBurnEnergy

The gaslighting! #DarkVisitors just sent an email announcing a name change, and by the way, we have always been at war with Eurasia!

“We are not an "anti-bot" company. Bots are just code. They can be used for "bad" things, like using IP in a way that the creator doesn't want, or "good" things, like making a purchase on behalf of a real customer. We are not a security company, and never have been. Our specialty is helping website owners navigate this new reality in a way that's best for their business, within a trusted ecosystem that provides mutual benefit to all sides.”

@juergen_hubert Crawlers and scrapers and fetchers! Oh my! - Dorothy (allegedly)

got dark visitors ? #RobotsTXT #DarkVisitors https://darkvisitors.com/

Track, control, and optimize your website for AI agents and bots

Use Dark Visitors to turn the rising wave of AI agents, LLM assistants, and other bots crawling your website into a new growth channel for your business

Dark Visitors
Hey #AI Companies: what about I give you my content in machine readable token optimised format on an extra endpoint and you pay me. You save crawling & parsing costs and i get something back.
Deal? #DarkVisitors

got robots.txt ?

Dark Visitors - list of known AI (and other) agents on the internet : ‘the hidden ecosystem of autonomous chatbots and data scrapers’ https://darkvisitors.com/ #DarkVisitors #WebCrawlers #WorldWideWeb

Track, control, and optimize your website for AI agents and bots

Use Dark Visitors to turn the rising wave of AI agents, LLM assistants, and other bots crawling your website into a new growth channel for your business

Dark Visitors
"It’s pretty crazy that not only a) these bots shamelessly harvest all your data without asking for permission and b) they do it in such a brute-force manner.
My coworker and security expert António pointed me to #DarkVisitors, and I’ll probably be installing their #WordPressPlugin on all my sites. For what it’s worth."
@john_fisherman on #AIscraping
https://fred-rocha.medium.com/ai-crawler-bots-on-the-hunt-caf5a59ff478
AI crawler bots on the hunt - Fred Rocha - Medium

I was perusing fumaca.pt, one of the websites I’m responsible for, and felt it was dragging. It felt heavy and slow, and usually it’s snappy and fast. I logged into our server using SSH and noticed…

Medium

The automatic #robots.txt generation from #darkvisitors only creates a 23 record file. what about all the other dozens, hundreds, from the #agents list?

```
curl -qs -X POST https://api.darkvisitors.com/robots-txts -H "Authorization: Bearer ${ACCESS_TOKEN}" -H 'Content-Type: application/json' \
-d '{
"agent_types": [
"AI Assistant",
"AI Data Scraper",
"AI Search Crawler",
"Undocumented AI Agent"
],
"disallow": "/"
}'
```

Anyone else seen that behaviour?

So I added the Dark Visitors plugin to my website this weekend.

What’s neat is seeing all the different bots/agents visiting the site, that I wasn’t seeing in other analytics tools.

#DarkVisitors #AiTheft

You might be familiar with what I'm terming the "Token Wars" - in which #LLM and #GenAI companies seek to ingest text, image, audio and video content to create their #ML models. Tokens are the basic unit of data input into these models - meaning that #scraping of web content is widespread.

In retaliation, many sites - such as Reddit, Inc. and Stack Overflow - are entering into content sharing deals with companies like OpenAI, or making their sites subscription only.

Another solution that has emerged recently is content blocking based on user agent. In web programming, the client requesting a web page identifies themself - usually as a browser or a bot.

User agents can be blocked by a website's robots.txt file - but only if the user agent respects the robots.txt protocol. Many web scrapers do not. Taking this a step further, network providers like Cloudflare are now offering solutions which block known token scraper bots at a a network level.

I've been playing with one of these solutions called #DarkVisitors for a couple weeks after learning it about it on The Sizzle and was **amazed** at how much traffic to my websites were bots, crawlers and content scrapers.

https://darkvisitors.com

(No backhanders here, it's just a very insightful tool)

#TokenWars #tokenization #scraping #bots #scrapy #WebScraping

Track, control, and optimize your website for AI agents and bots

Use Dark Visitors to turn the rising wave of AI agents, LLM assistants, and other bots crawling your website into a new growth channel for your business

Dark Visitors
WTF happened last night? Dark Visitors recorded nearly 1,200 hits from Mastodon instances fetching Open Graph data on my blog 🤯
#Mastodon #DarkVisitors #OpenGraph