Ok, I'm tired from that shit. Start to make a thing and start banning LLM-bots by the whole AS (inspired by the corresponding @Soblow threads  ), since npf table cannot become larger infinitely.

#AInt #LLMbots #homelab #selfhosting

Okay, after 15 minutes of waiting, I got a list of AS sorted from the most populated by the bots, to the less populated. Top 20 records aren't surprising: just a Microsoft and Google networks, plus a bit of DigitalOcean and Alibaba.
I'm starting to shoot unwanted bots from unwanted ASNs. Will see how it will work…

It works! 

The line of blocked bots in the daily munin graph is almost horizontal 

@evgandr
Awk continues to be quietly brilliant
@dogriley Definitely!  I'm a big fan of it
@evgandr i see more poison from do asn than i do other areas lately. their trust and safety system used require a dna sample. seems things have changed

@evgandr @Soblow

If you have a list, it would be cool to compare with one I generate daily from a tarpit with 500-600k/24h crawler hits https://scienceispoetry.net/files/parasites.txt

@JulianOliver @Soblow Huh, in my fail2ban list (4308 records) and your list (2185 records) β€” there are only 23 matches. Most of them from ASN 8075 (Microsoft network); also there are IPs from ASN 15169 and ASN 32934.

Looks like pretty bad β€” possibly, the amount of bots' IPs is so big that they almost doesn't intersect between two random hosts 

List of matched IPs:

57.141.20.0
57.141.20.14
57.141.20.59
57.141.20.6
66.249.66.162
74.7.175.185
74.7.227.11
74.7.227.135
74.7.227.156
74.7.227.175
74.7.227.42
74.7.228.29
74.7.230.15
74.7.230.34
74.7.230.47
74.7.241.16
74.7.241.18
74.7.241.21
74.7.242.14
74.7.242.3
74.7.242.47
74.7.243.216
74.7.244.30

@evgandr @Soblow Very interesting. Yes I suspect it is gigantic, spread out over ranges they've bought up. Swathes of the already exhausted v4 space clearly dedicated to this. I saw big swings across subnets from the same UA, so I guess they're just pointing their wasp swarms based on some combination of network distance, load balancing and/or harvest metric.