A part of the access logs:

(Wide image for showing the user agents, public URLs were accessed but I'm not gonna dox myself that easily)

Looks like all of the URLs in their list were accessed by all of their clients regardless if one of their clients scraped that page before, creating a huge unnecessary load.

Nearly 40 requests in a single second? No one will be able to convince me that this is a legitimate use. This is the new normal yet it's not normal. You can't even really fight against this. Maybe you can use CloudFlare or Anubis if the client doesn't care that much about a professional image.

#fuck_ai #fuckai #fuckscrapers

I've spent some of my yesterday afternoon (a Sunday) with an important issue at work: I've got a call that a client's website is down.

It was weird because the machine's resources weren't exhausted (this isn't really a big site, it runs on a VPS with a dedicated CPU time package, no cloud native stuff) yet the site wasn't really responding.

Long story short after debugging the issue for a few hours we realised why the backend was responding but the frontend proxy was not: There were hundreds of connections just left open. A colleague with better pattern matching skills against my tired head found out in the logs that 90% of the requests were from one /16 IP range. Plugging an address into ipinfo.io showed this:

Company: Alibaba Cloud LLC

Fuckers couldn't keep their AI scraper from messing up everything.

#fuck_ai #fuckai #fuckscrapers

Kinda interesting to log into Cloudflare and see (and block) all the AI scraper bots that have accessed my account. There are a handful more below the screenshot.

I blocked Archive.org because if I delete something, I want it gone. I also don't want random edits I might be making to the site to be archived.

I really wish I could block Google and Bing, but I guess I might want search traffic once I start trying to sell photos 🤷‍♂️

#FuckScrapers #KillItWithFire