Got an important lesson in selection bias (is that the right term?) today.
To reduce the load on #GameSieve, I've been firewall-blocking recurring IP-ranges for one of the more obnoxious AI-scrapers when they use data centers rather than their residential proxies (those just get soft-blocks which humans can bypass). One of the more frequent sources, also for a bunch of annoying vulnerability scanners, is datacamp.
I also monitor what gets past the soft-block, as I made that initially trivial to bypass, and want to see if I warrant (and can then waste) human attention from the organizations behind those AI-scrapers.
Today I saw a bypass from a datacamp IP-address.
Because I check the source of all evil traffic, and have seen datacamp a lot in that traffic, I immediately concluded that the datacamp source meant that it was an evil party bypassing my soft-block.
The thing is, I have no basis for comparison! I have no idea how much of my human traffic originates from there! I am aware of the fallacy here, and despite the actual behaviour not being indicative of scraping, I still nearly firewalled it. (Before realizing that hey, this was probably a legit human using a VPN!)