OK, found a more aggressive scraping defense mechanism that has managed to catch over 9000 distinct IPs. Is there a way to semi-automatically analyze this, collect the relevant subnets and find who they are assigned to, to see what the downsides would be of the subnet-wide ban?
#ShieldsUp
This is turning out to be an EXCELLENT collector for scraper IPs. But I really need to make sense of it somehow. I'm already at ~30K IPs in approx. 4.5 hours.
16 hours in, we're at ~125K IPs, so we're keeping the rate of around 2 attempts per second. I'm still waiting for recommendations on tools that would allow me to wade through this huge collection of IPs to get statistics on who they belong to, if there's an actual botnet in it (inclusive of residential addresses taken over by it) and/or which datacenters are involved. Any
#recommendations?
#askFedi #fediHelp #networkingI mean, I can cook up a script that iterates over the first IP, runs a whois query to get the route, finds all IPs that match that route, and then moves on to the next uncollected IP, but I can't believe nobody has done something like that already.
I ended up cooking my own script. Of course the issue with processing the WHOIS information of 175K IPs (and growing) is that queries to the WHOIS database have to be rate limited. I wrote a trivial Python script that does what I mentioned in the previous post, which limits queries to those for which no IP range has been “found” yet, but apparently even if inetnum ranges returned by whois are too tight, so the reduction isn't “impressive” (in some ranges there's like 1 or 2 IPs only)