@matthiasott This is what I currently do https://rknight.me/blog/blocking-bots-with-nginx/
But also yes, I’m seeing more of it. Specifically Chinese bots that go over every single page on my site one by one.
Check more or less recent posts from
@ [email protected]
@ [email protected]
@ [email protected]
@matthiasott hell yes! This is a terrible problem on my site codepoints.net, even with CloudFlare in front of it.
On single code point pages I deep-link to my site search for similar code points. In the last year it became worse and worse that “users” from China with Chrome follow those links and bring the site down due to excessive DB load from the search.
I hated having to add rate limiting etc, and I know of at least one legitimate user who was bitten by it.
Such a pest on the open web!
OSM mentioned that as a poblem in https://en.osm.town/@osm_tech/115968544599864782 and other messages.
If you write about the messy reality behind "free" internet services: we're seeing #OpenStreetMap hammered by scrapers hiding behind residential proxy/embedded-SDK networks. We're a volunteer-run service and the costs are real. We'd love to talk to a journalist about what we're seeing + how we're responding. #AI #Bots #Abuse
@matthiasott I had a short but interesting conversation with the owner of a hosting company.
Here is the gist of it:
Part of the explanation for the surge in traffic can be an endpoint that takes variables via HTTP-GET, because the bots then try all possible combinations of variables. By making content only available via one URL per piece, say, articles cannot be linked to with a …?related=tag1,tag2,tag3… you should be able to reduce the load.