so far, when I see that a specific IP address has visited my websites a lot, I've been searching the host just to make sure it's not something important, and then blocking when it almost inevitably isn't (most say host not found, others are clearly crawlers)
but one of the ones I searched just now says it's compute.amazonaws.com and I'm aware that AWS powers most of the internet these days. does anyone know reasons I should or shouldn't block this IP?

Edit: it was *only* scraping robots.txt 🤦

@raphaelmorgan You may run the risk of blocking traffic from actual users that are using Amazon Workspace (their virtual desktop environment), and perhaps real users that are having their traffic routed from their company and out over an AWS VPC.

Also you might be blocking humans who are doing something like using an EC2 instance as part of a personal VPN exit node (I.e. TailScale / TwinGate)

It might be worth trying to do a deeper behavioral analysis of that traffic just to confirm whether it’s more likely automated or human - but depending on your intended audience… I’d probably still block it.

@sudonem I ended up blocking it because turns out all it was visiting was robots.txt, so it was like "well seems pretty harmless but I might as well save us both some resources" 😂

@raphaelmorgan ha. Yeah - good call.

Depending on what you’re using for DNS & domain hosting, it’s worth looking at Cloudflare’s AI bot blocking implementation (which is free if you’re already hosting anything there).

I can make pro & com arguments about Cloudflare - but it’s a pretty neat thing they’ve built and also neat that they offer it for free because then you don’t need to spend time chasing down specific subnets / domains to block by hand.

How to Use Free Cloudflare to Block AI Bots & Scrapers

Stop AI bots from stealing your website. Learn how to use Cloudflare’s free tools, like AI Labyrinth & Bot Fight Mode, to block scrapers and protect your content

Up & Running Inc - Tech How Tos
@sudonem thanks, and I'll keep that in mind, but for now I'm seeing how much I can do while relying on corporations as much as possible! Fail2ban seems to be helping, and I'm adding to it over time, so hopefully eventually I won't have to worry too much about manually blocking IPs