Who do you think you are?

47.128.32.0 - - [18/Mar/2026:00:48:01 +0100] "GET /robots.txt HTTP/1.1" 403 239 "-" "-" 1650 4269

#Amazon #AWS Singapore.

Good on you that #CrowdSec won't immediately block on a missing user-agent, but my httpd-ACL does.

#DarkVisitors #AI #Crawler #GenAI #SocialPermissionToBurnEnergy

 I have been running iocaine on my server for a week now. During this time, 7,076,701 requests have passed through iocaine, 3,312,318 of which were identified as AI crawlers/bots. 3,741,577 requests came from crawlers/bots that got stuck in iocaine's deadly maze, consuming an infinite amount of poisoned garbage. Furthermore, 972 crawlers/bots were detected that were routed into the maze via major browsers.

All of this is managed by iocaine with just ~80 MB of memory and ~0.1% direct CPU usage. Now that’s what I call efficient! Well done, @algernon.

Let's fight back against AI crawlers and bots. Thanks to projects like iocaine, this is entirely possible, not just theory  

#iocaine #ai #llm #FckAI #FckLLMs #selfhosting #crawler #bots

Maybe AI is the response to WYSIWYG: what you see is what you claim ie WYSIWYC?
Of course, the AI industry will provide solutions, that only them can create to the flood of frauds that they have helped generating so far, by dumping models in a legal void. Who will pay for an AI premium, to protect against endless AI scams?
Just remember it all started as -for-research-

#transparency #omnibus #GDPR #EU #AI #insurance #fraud #crawler #webdev

©️ Nicolas Mouart, 2026

I have just installed iocaine 3.2.0 by @algernon and have already started successfully serving poisoned garbage to the AI agents. I love it! I especially like how simple the setup was, and how easy it was to expand my existing Caddyfile. My monthly donation is set up too. What a great project!

#iocaine #ai #llm #FckAI #FckLLMs #bot #crawler

Hallo liebe Fedinauten hier auf anonsys.net. Ab sofort wird diese Instanz vor AI- bzw. KI-Crawlern geschützt. Diese werden geblockt bzw. gebannt.

Danke @rainer für den Tipp. Habe diesen jetzt auf anonsys.net aktiviert und lasse den Filter einmal täglich aktualisieren.

Verdammt interessant ist, dass nach ca. 10 Minuten der Aktivierung des Filters bereits 128 AI-Crawler gebannt wurden:

Status for the jail: apache-ai-crawler |- Filter | |- Currently failed: 0 | |- Total failed: 33 | `- File list: /var/log/apache2/useragent.log `- Actions |- Currently banned: 128 |- Total banned: 128 `- Banned IP list: 100.28.204.82 100.29.160.53 107.20.181.148 119.28.140.106 18.207.89.138 18.214.124.6 18.215.24.66 18.215.49.176 18.232.11.247 18.235.158.19 184.73.167.217 184.73.239.35 216.73.216.43 23.21.179.120 23.21.225.190 2 3.21.227.240 23.21.228.180 23.23.99.55 3.209.174.110 3.212.205.90 3.212.86.97 3.220.148.166 3.221.244.28 3.222.190.107 3.93.211.16 3.93.253.174 34.192.67.98 34.195.248.30 34.205.163.103 34.225.138.57 34.226.89.140 34.227.234.246 34.230. 124.21 34.231.45.47 35.169.102.85 35.169.119.108 35.171.117.160 43.130.101.151 43.130.116.87 43.130.26.3 43.134.186.61 43.135.115.233 43.153.192.98 43.154.140.188 43.154.250.181 43.155.157.239 43.157.20.63 43.157.46.118 43.164.195.17 43 .164.196.57 43.164.197.224 43.165.135.242 43.165.189.206 43.166.128.86 43.166.242.189 43.166.244.66 44.194.134.53 44.205.74.196 44.209.35.147 44.210.213.220 44.213.202.136 44.217.255.167 44.220.2.97 44.221.105.234 44.223.116.180 47.128. 112.235 47.128.112.241 47.128.63.217 49.51.166.228 50.19.102.70 52.0.63.151 52.2.4.213 52.201.155.215 52.203.237.170 52.4.229.9 52.5.232.250 52.54.157.23 52.6.97.88 52.70.123.241 54.145.82.217 54.147.80.137 54.157.84.74 54.159.18.27 54. 235.172.108 54.83.23.103 54.83.240.58 54.83.56.1 66.249.68.128 66.249.68.130 98.82.38.120 98.82.63.147 98.82.66.172 98.83.10.183 98.83.8.142 98.84.60.17 18.208.11.93 18.214.238.178 3.218.35.239 44.212.131.50 54.157.99.244 3.230.69.161 1 8.235.81.246 52.203.152.231 35.173.38.202 3.232.82.72 34.193.2.57 54.166.126.132 3.225.9.97 98.82.39.241 98.84.200.43 3.94.156.104 44.223.115.10 43.163.104.54 43.157.22.109 43.130.131.18 43.131.26.226 49.51.132.100 50.16.248.61 43.155.1 62.41 52.203.68.145 54.89.90.224 34.236.185.101 52.200.251.20 43.166.224.244 98.82.107.102 129.226.174.80 18.205.213.231 34.204.150.196

Es werden minütlich mehr. Das ist echt Wahnsinn! 😳

Quelle: rainer.sokoll.com/?p=8353

#anonsys.net #friendica #fedinauten #ai #ki #crawler

Howto block AI bots with fail2ban (Apache) - Rainers kleine Welt

2025 年爬十億個頁面的成本

上禮拜看到的文章,作者在 AWS 上面只用 25.5 個小時就爬了 1B 個頁面,在 tune 過效能後的成本是 $462:「Crawling a billion web pages in just over 24 hours, in 2025 (via)」。 作者裡面有提到一篇 2012 年的「How to crawl a quarter billion webpages in 40 hou...

Gea-Suan Lin's BLOG
A better way to crawl websites with PHP - Freek Van der Herten's blog on Laravel, PHP and AI https://links.shikiryu.com/shaare/CC0c5g Un crawler de site (et pas que d'une page donc) en PHP qui a l'air d'être clair en plus d'être complet. Sous l'coude.
#php #dev #crawler
A better way to crawl websites with PHP - Freek Van der Herten's blog on Laravel, PHP and AI

Un crawler de site (et pas que d'une page donc) en PHP qui a l'air d'être clair en plus d'être complet. Sous l'coude.

I am looking for a nice tool that I could run on my home server to poison my internet useage pattern.

So far I could only find some outdated projects...

Do you have any recommendations?

#diday #crawler #selfhosting #anonymity #advertising

Wie KI die Art und Weise, wie wir Inhalte finden, neu definiert

Die Art und Weise, wie Menschen Informationen online finden, ändert sich schnell. Da Künstliche Intelligenz (KI) zu einem Kernbestandteil davon wird, wie Benutzer Inhalte entdecken, müssen Ihre Inhalte härter und intelligenter arbeiten, um gesehen zu werden.

https://clearleft.com/thinking/how-ai-is-redefining-the-way-we-find-content

#Crawler #Information #Inhalt #KI #KIBots #KünstlicheIntelligenz #SEO #GEO #Suche #Suchmaschine #Optimieren

How AI is redefining the way we find content

The way people find information online is changing fast. With Artificial Intelligence (AI) becoming a core part of how users discover content, your…

Clearleft

Not long before I can celebrate 1.000.000 visits on my website by crawlers since mid December 2025.

Thanks for making me feel important!!!

#smallweb #neocities #crawler #analytics #stats