@kkarhan @andreasdotorg @internetarchive
With swarms of serverless harvesters?
In the end you still end up with CIDR blocking.
@simon_lucy @andreasdotorg @internetarchive
Then that's a necessary sacrifice one needs to do.
If #aws doesn't combat #abuse then it's only valid to #DROP [#DontRouteOrPeer] their systems...
And yes, I do yeet hostile networks as an act of self- and mutual ITsec...
https://github.com/greyhat-academy/lists.d/blob/main/blocklists.list.tsv
@kkarhan @andreasdotorg @internetarchive
The point I was making is that IP specific rules aren't sufficient.
@simon_lucy @andreasdotorg @internetarchive OFC you'd have to block all CIDRs associated to the ASN of AWS...
Which is relatively easy considering that said assignments are public...
@kkarhan @andreasdotorg @internetarchive
Yes, and that negates archive.org, so it's a very temporary mitigation. I imagine AWS knows and have begun limiting the customer.
@CauseOfBSOD @andreasdotorg why respond with garbage data and insults when you can respond with a seven or eight figure invoice? (recurring monthly, of course)
of course they can afford it, they can afford to waste their money on AWS...
...Because society doesn't prohibit bad actors when they have fat wallets?
@andreasdotorg @internetarchive direct link to blog: https://blog.archive.org/2023/05/29/let-us-serve-you-but-dont-bring-us-down/
“Those wanting to use our materials in bulk should start slowly, and ramp up.
Also, if you are starting a large project please contact us at [email protected], we are here to help.
If you find yourself blocked, please don’t just start again, reach out.”
@internetarchive
@brewsterkahle
any updates and background appreciated here in this cosy and federated place :)
it is not easy having #good #things in the presence of #capitalism.
@andreasdotorg Oh, man. I have a couple of library archive sites, and I ended up just blocking AWS and Azure entirely because of the constant high-pressure scraping.
Some people just can't keep their scraping down below a reasonable rate limit. Like, I'd be fine if you kept requests under 1000/hour or something but if you're spinning up 50 servers to hoover up thousands of pages each as fast as I can serve them, fuck y'all. Now you get NOTHING.

Add to the terms and conditions that traffic coming for bots or “non human” interface will be charged based on the traffic generated.
Then send them the bill.
They need to filter out #AWS and other corporate mass-retrievers.
10-20 accesses per day or so.
1000 seems way too high fit me still @andreasdotorg
@andreasdotorg we can give them hell