Mozilla/5.0 (compatible; Thinkbot/0.5.8; +In­_the­_test­_phase,­_if­_the­_Thinkbot­_brings­_you­_trouble,­_please­_block­_its_IP_address._Thank_you.)

😂🙈

https://boston.conman.org/2025/08/21.1

#Thinkbot #tencent

“Bro, ban me at the IP level if you don't like me!” - The Boston Diaries - Captain Napalm

“Bro, ban me at the IP level if you don't like me!”

The Boston Diaries

Ban me at the IP level if you don't like me | Hacker News

LinkBro, ban me at the IP level if you don't like me!
https://boston.conman.org/2025/08/21

📌 Summary:
本文作者分享自身觀察到來自多個 IP 地址的網路爬蟲(bot)流量,尤其是標示為「Thinkbot」的機器人,該機器人並未遵守 robots.txt 規範,且自我標示為「如果造成困擾,請從 IP 層級封鎖它」。作者調查發現這些 IP 來自 41 個不同的網路區塊,且均屬於中國大型網路公司騰訊(Tencent)所有。作者推測中國官方可能透過這類廣泛分散的 IP 方式來外部化「長城防火牆」的成本,無論內容被擷取或封鎖都不影響其策略,因此決定將相關網段納入「壞機器人防火牆規則」。文章並引述 Hacker News 社羣討論,指出許多網站管理者為減少惡意流量,除了封鎖中國等特定國家 AS(Autonomous System,自治系統)外,也會依據不同策略封鎖特定雲端服務商和 IP 族羣;同時因應網路爬蟲多變的 IP 來源,IP 層級封鎖雖有效,但亦存在易被繞過和影響正常使用者的侷限。透過綜合分析,文中呈現了封鎖 IP 範圍以降低伺服器負擔的因果關係與實務應用背景。

🎯 Key Points:
→ 【Thinkbot 與 IP 分析】
★ Thinkbot 自稱測試階段,拒絕遵守 robots.txt,而是要求直接封 IP,使用 74 組不同 IP 地址,涵蓋 41 個網路區塊,全屬騰訊所有,累計 IP 數達 476,590 個。
★ 作者詳列騰訊網段,採用防火牆規則加以封鎖以防止爬蟲惡意流量。

→ 【中國爬蟲及政治推測】
★ 推測中國官方(CCP)可能鼓勵透過龐大的 IP 羣發起爬蟲行動,藉以外部化「長城防火牆」的間接成本。
★ 不論網站是否封鎖,爬蟲仍可分散來源持續擷取,是對抗爬蟲困境之一。

→ 【社羣討論與實務經驗】
★ 多數網站管理者透過封鎖中國、俄羅斯、特定雲端供應商的 AS,能顯著降低惡意流量與攻擊。
★ 一些小國(如塞舌爾、賽普勒斯)因特殊稅務/監管優勢成為爬蟲流量中繼站,實為背後大國代理。
★ IP 封鎖有利於降低伺服器負載與惡意攻擊,但可能也導致正常用戶(國際旅客或 VPN 使用者)受限,乃成本與效益考量。
★ 社羣亦探討白名單與黑名單的利弊,現實中需結合多種技術與策略,維持對抗惡意機器人與保護正常訪客間的平衡。
★ 封鎖 Cloud 供應商 IP 雖然有效,但也會誤傷合法用戶,依賴 IP 仍非萬靈丹。

🔖 Keywords:
#Thinkbot #IP封鎖 #騰訊_Tencent #長城防火牆_Great_Firewall #惡意爬蟲_botnet

Ban me at the IP level if you don't like me | Hacker News

“Bro, ban me at the IP level if you don't like me!” - The Boston Diaries - Captain Napalm

“Bro, ban me at the IP level if you don't like me!”

The Boston Diaries

All has gone very quiet on the #botnet front. But that has exposed a new botnet, stealthily crawling away.

I've started giving them names - is this a bad sign? Am I getting too attached?!!

Anyway, more analysis, stats, graphs here:

https://evilgeniusrobot.uk/botnet-reports/all-quiet-for-now-and-a-new-player-20250815.html

#bravnar #hugh #thinkbot

Let's test my theory about the bots harvesting links from here on Mastodon.

These links have not been seen anywhere so far, they're unique to this post.

https://another.evilgeniusrobot.uk/test-on-mastodon-only

https://an.evilgeniusrobot.uk/test-on-mastodon-exclusive

I suspect these will start to show up in the logs at some point. I exclude the #MastoDDoS effect from the stats, that doesn't count.

#Thinkbot has been pretty fast on the uptake but it could be that those posts got boosted widely, I dunno.

[Hmm.. "clever hand sparrow", indeed :) ]

#botnet

Test On Mastodon Only

Some other preliminary analysis makes me think that both botnets, and also my old friend #Thinkbot, got the initial links from this Mastodon account.

All crawls seem to start from the /bonk-wave and /not-bonk-wave URLs I posted here when I first launched the site, rather than the root of the home page which I've posted elsewhere. I haven't posted those links anywhere but here.

Thinkbot also found my "uncooked" releases very quickly which backs up that theory.

#botnet

@simon

#Thinkbot does seem to space out requests to roughly one per minute or so - I'm not seeing great bursts of activity - just a constant trickle.

But as it will download anything it finds, that could include big zip files (e.g. on #Faircamp sites it will find and download any FLAC archives).

This could easily rack up quite a lot of bandwidth on bigger sites, which might cost someone money.

It's a particularly antisocial bot. Just rude, I would say, rather than actively malicious.

@keefmarshall I've spotted Thinkbot in some of our clients' httpd logs. I could sense trouble in the wind... #thinkbot

#Thinkbot's user-agent says:

"if_the_Thinkbot_brings_you_trouble,_please_block_its_IP_address._Thank_you."

In the last 24 hours or so, I've seen 1700+ requests from Thinkbot to my robot, from 74 different IP addresses. They're not in an easily blockable CIDR range.

I'll keep track of the IPs and publish them somewhere in case it's useful but they're coming from all over the place.

The few I looked up are at least all linked to Tencent in some way, but spread across the globe.

I got a little bit annoyed at a bot hoovering up all the large zip files of lossless music from various websites I run. I may have slightly overreacted...

https://evilgeniusrobot.uk/posts/an-evil-genius-robot.html

Still it's finally a good use of that domain name I've had for a few years now!

#AlgorithmicSabotage #NoAI #ThinkBot

An Evil Genius Robot

A honey trap for annoying web crawlers, especially AI bots