Can you guess what time I enabled Anubis in front of Forgejo?

#BSDCafe #Anubis #Forgejo #Bots #NoBot #Scrapers

After 615 requests over pretty much exactly 24 hours, the #aiscraper abusing #residentialproxies to try and repeatedly request one particular page on #GameSieve - 18 times successfully, before I noticed it being stuck in a loop and added another block rule - finally disappeared. However, its final request was successful and is worrying, as it came through fetch.tunnel.googlezip.net - which apparently is #Google 's Chrome Prefetch Proxy.

I've noticed requests from that range before, but always assumed that was legitimate. Do I now have to think about blocking that bit of infrastructure as well, as #scrapers have found a way to piggyback on it? Urgh!

I guess I'll start by blocking prefetching via .well-known/traffic-advice and see what that does...

#aiscrapers #aibots

Iocaine and my custom solution aren't good enough.  I'm considering to add to login to my website rewrite as protection against bots.

I would always offer an anonymous session after completing a proof of work (which is also available without JS).

Do you think this is okay? Please don't hesitate to reply!

#website #personalBlog #PersonalSites #indieweb #spam #spamprotection #scrapers #selfhosting #iocaine

Yes, I'd would even login using Fedi or IndieAuth.
Yes, I'd use the anonymous session.
No, I'd avoid your website if you do that.
Something else
Poll ends at .

A couple new #scrapers to block that I haven't seen on robotstxt.com:
* Amzn-SearchBot is the search engine for Alexa and Rufus. Amazon claims on https://developer.amazon.com/amazonbot that it doesn't do AI training, but it still hammered our sites the past two days.
* SleepBot I haven't found much on, but it was requesting URLs for files that were submitted in a document upload spam attack we had a few months ago. Very sus.

#SysAdmin #webhosting #bots

About AmazonBot

Customer facing page of Amazonbot crawler which all web content publishers can refer to.

Developer Portal Master
@Nuldorv @santiago Acá se ve claramente el scraping que hace pegar saltos pasando de los usuales 1 - 2k packets/sec hasta casi 14k, unas 10 veces más del tráfico normal. Las olas no son verde lleno, si no como impulsos o picos bien finos, porque el firewall detecta los abusos y bloquea las IP. Igual van apareciendo nuevos bots y nuevas evidencias en los logs que vamos agregando para mitigarlos #ataques #bots #scrapers
Nos estan escrapeando de Meta #meta #scrapers #ia #bots
Desde afuera todavia se nota cierta latencia, a veces, posiblemente porque no han cesado los ataques de scraping. En la red interna vuela, y en las metricas los servidores no estan bajo carga o demanda altos, estan normales. El problema en ese caso sería que todos esos ataques que el firewall esta bloqueando exitosamente, lo hace recien dentro de la red, por lo que ese trafico ocupa lugar en la conexión dejando menos ancho de banda neto para el tráfico legítimo... veremos si la cosa mejora en los próximos dias #undernet #ataque #bots #scrapers #iabot #peertube
🎩🤖 Oh, look, another #GitHub hero has blessed us with a "groundbreaking" #tool to trap #AI #web #scrapers in a "poison pit." Because clearly, what we all need is a #digital Venus flytrap for code 😏. Meanwhile, GitHub's feature salad just keeps growing, because who doesn't love a good menu with more options than a diner? 🍔💻
https://github.com/austin-weeks/miasma #innovation #featureupdate #codinghumor #HackerNews #ngated
GitHub - austin-weeks/miasma: Trap AI web scrapers in an endless poison pit.

Trap AI web scrapers in an endless poison pit. Contribute to austin-weeks/miasma development by creating an account on GitHub.

GitHub

No outages in the latest Apache logs. However, there is plenty of suspicious activity.

The log has 16,033 lines.

Of these, 1,559 lines feature the "RecentChanges" function for my wikis. Which is something regular users _might_ call up from time to time, but I suspect that #scrapers are the more likely culprits.

The vast majority of these requests come from a random assortment of IP addresses, and they usually end with something on the lines of:

"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/123.0.0.0 Safari/537.36"

So yeah, "anonymous bot nets scraping the Interwebs for nefarious purposes" would be by first guess.