Alright my web friends! 👋 Hands up who has experienced a surge in (LLM) bot traffic recently and maybe even had to take steps against them? I’m writing a blog post about this atm and it would be great to hear whether others are experiencing the same with their #blogs and personal #websites. #RT == 💚

@matthiasott This is what I currently do https://rknight.me/blog/blocking-bots-with-nginx/

But also yes, I’m seeing more of it. Specifically Chinese bots that go over every single page on my site one by one.

Blocking Bots with Nginx

How I've automated updating the bot list to block access to my site

@matthiasott hard to tell for my personal sites b/c I don't monitor the traffic/stats at all. but in client projects having trackers there's been a huge increase in botish visits in the recent months, and even those who claim to filter out the known bots from their traffic stats have seen a jump in suspicious looking visits
Denial

The best of the web is under continuous attack from the technology that powers your generative “AI” tools.

@matthiasott yes, a while ago, 99% of our load came from AI crawlers, after updating the robots.txt it became less, but there are still a lot of bad actors out there and that's the ones who at least set the user agent to identify themselves, as user agent blocking becomes more common I really wonder how bad they'll get

@matthiasott hell yes! This is a terrible problem on my site codepoints.net, even with CloudFlare in front of it.

On single code point pages I deep-link to my site search for similar code points. In the last year it became worse and worse that “users” from China with Chrome follow those links and bring the site down due to excessive DB load from the search.

I hated having to add rate limiting etc, and I know of at least one legitimate user who was bitten by it.

Such a pest on the open web!

@matthiasott I only heard stories until this week, when I was hit. I have a pet project that's fetching a lot of data and is a bit slow, but perfectly fine for human use. However, I suddenly had both Baidu and Meta's crawlers loading several pages at once, slowing everything down dramatically. Blocking both bots from that section of my site with robots.txt solved the problem... at least for now!
@matthiasott Yep. Not as bad as others, but most recently I’ve seen a surge in traffic from Singapore. Will be updating my `robots.txt` list and looking into blocking soon.
@matthiasott As of a few minutes ago I'm now blocking known bots via `.htaccess` (not just `robots.txt`) 🤞
@matthiasott Following up, doesn’t seem to have made any difference
@matthiasott yeah… and it's one of the big reasons i stopped blogging.
@matthiasott 👋 yes, after a few times of an unexpectedly larger Netlify bill due to astronomical traffic (I think due to LLM bots) I switched hosts so that I can have a larger plan at lower cost.
@matthiasott thanks this reminded me to update my AI statement
@matthiasott I'm listing AI (and other poorly behaved crawlers) in my robots.txt while also sending them a 403 to anything else they request should they not honor that. I also had to block all traffic from China due to bots fetching endless pages sequentially. https://www.coryd.dev/posts/2026/blocking-entire-countries-because-of-scrapers
Blocking entire countries because of scrapers

Cory Dransfeldt

OSM mentioned that as a poblem in https://en.osm.town/@osm_tech/115968544599864782 and other messages.

@matthiasott

OpenStreetMap Ops Team (@[email protected])

If you write about the messy reality behind "free" internet services: we're seeing #OpenStreetMap hammered by scrapers hiding behind residential proxy/embedded-SDK networks. We're a volunteer-run service and the costs are real. We'd love to talk to a journalist about what we're seeing + how we're responding. #AI #Bots #Abuse

OSM Town | Mapstodon for OpenStreetMap
@matthiasott I have this once or twice a week and now started geo blocking, currently Singapore. There were even some bots saying "sorry we are beta and break things, block us if you want" in their user agent string... So I blocked some user agents too. But most of the requests look like normal desktop browsers, so i needed geo blocking. The requests came so rapidly I couldn't use the Kirby panel anymore even though I already had caching enabled.
@matthiasott I have some stats and the max was an increase of "visits" by about 1000%, when those crawlers hit.

@matthiasott I had a short but interesting conversation with the owner of a hosting company.

Here is the gist of it:

Part of the explanation for the surge in traffic can be an endpoint that takes variables via HTTP-GET, because the bots then try all possible combinations of variables. By making content only available via one URL per piece, say, articles cannot be linked to with a …?related=tag1,tag2,tag3… you should be able to reduce the load.

@stairjoke That’s really interesting – because this is actually part of what I did to reduce the load a bit! My notes page used to work with multiple tags as URL params. And I indeed saw a lot of requests by bots trying all kinds of combinations. I now reduced this to one tag, which already helped a bit. Although I also did a lot of other stuff, so I can’t say for sure how much which step helped exactly. 😅