Anubis is designed to protect websites from AI scraper bots, Anubis primarily focuses on parameters like the user agent sent with the request and looks for oddities in the connection. “Known good” and harmless clients are always accepted, and “Known bad” clients are always denied. Now the same tool is used to get protection from a DDoS attack: https://fabulous.systems/posts/2025/05/anubis-saved-our-websites-from-a-ddos-attack/

#opensource #Linux #cybersecurity

The Day Anubis Saved Our Websites From a DDoS Attack

Anubis, designed to protect your website from unwanted requests from AI crawlers, is way more powerful than you might think.

fabulous.systems
We need mod_Anubis directly added to Nginx or Apache with configuration options like allowing or blocking specific URLs, IPs, CIDRs, or even data center ranges like AWS. additionally, customization for traps and stuff like that. Freeloader AI companies abuse open-source projects, small businesses, blogs, forums, and artists without giving back to communities or individuals. They are making billions while people are left with server bandwidth bills.
@nixCraft There exists https://github.com/simon987/ngx_http_js_challenge_module, a mod for Nginx that works similar to Anubis.
GitHub - simon987/ngx_http_js_challenge_module: Simple javascript proof-of-work based access for Nginx with virtually no overhead. (Similar to Cloudflare's anti-DDoS feature)

Simple javascript proof-of-work based access for Nginx with virtually no overhead. (Similar to Cloudflare's anti-DDoS feature) - simon987/ngx_http_js_challenge_module

GitHub
@nixCraft
I think the "I block malicious actors with Zip bombs" post that floated around this week also has some merit when combined with Anubis.
@nixCraft I'd love this. My apache/fail2ban config is just chasing its tail since the IP addresses keep moving. I block every IP I find and get new ones the next day.
@nixCraft those companies are already using open source code without giving anything back. Wondering why people don't mention that.
@nixCraft I wonder if @CrowdSec would be helpful in this case.
@crisl_at @nixCraft Definitely! CrowdSec provides a free AI Crawlers blocklist built specifically to protect FOSS projects, blogs, forums, and communities from abusive AI scraping. It’s actively maintained and worth a try: https://www.crowdsec.net/blog/protecting-foss-with-free-ai-crawlers-blocklist 💪
Protecting FOSS Communities from AI Crawlers with CrowdSec

Announcing free access to the CrowdSec AI Crawlers Blocklist for all open source projects, to help FOSS communities reduce unwanted traffic from AI bots.

@nixCraft Do you know if there is currently support for github pages?
@lwflouisa @nixCraft Why would you need to deploy Anubis there? I'm curious.
@lwflouisa @nixCraft Aren't GitHub Pages hosted by GitHub? So, to put it bluntly, it's not your problem.
@nixCraft I shouldn't have to clarify the fact that I've been gradually moving away from Github, but that wont stop reply guys and free speech absolutists apparently.
@nixCraft well actually it's mainly just proof of work in between you the client and the server. So bots do not like proof of work.
@nixCraft @mooncorebunny Does it provide a non-Javascript way for custom clients to do the proof of work?
@lispi314 @nixCraft "A no-JS solution is a work-in-progress" it says, when accessed with NoScript by default blocking scripts...
@mooncorebunny @lispi314 @nixCraft does it block the #lynx standard textmode webbrowser, links+, dillo, arachne, arena, w3m, …?
@mirabilos @lispi314 @nixCraft I tried lynx and elinks and dillo, and Anubis doesn't appear to challenge them, at least in my case...
@nixCraft But still no instructions on how to selfhost it on a personal infra, is it ?

@nixCraft

Meta's AI has already overcome this.

The way Meta is doing it is by running a bunch of mini instances, running real web browsers (just like you), and their AI scraps from within the browser. Each mini instance also has a unique IP and some random browser history to help it pass as human. The AI can also simulate random mouse movements and bypass (solve) Captcha.

However, the good news, at least for now, is it is slower than the traditional scraping method. And thankfully, most of the other AIs out there do not yet go to such extremes. It is costing Meta a fortunate to run their little pilot program.

@Linux @nixCraft the amount of money and effort they'll spend to avoid having to get permission to use and ultimately paying people for this content is staggering

@ivy @nixCraft

There is more money to be made than any lawsuit could ever deter them. And often, as is the case with open source software, they will argue no license is being broken, since from their point of view, they're just forking the code.

@Linux @nixCraft its not even just open source though, they're going after pretty much everything they can possibly find and they don't respect robots.txt or licencing
@Linux @nixCraft blogs, git forges, artwork, it's all just content to throw on the big amalgam content machine

@ivy @nixCraft

Their viewpoint is, it exists, and so they must have it, you're only delaying the evitable, and that resistance is futile.

I do not think they really care what you or me, or anyone else thinks or feels about it.

@ivy @nixCraft

However, if it is any comfort, I know Meta's AI only scrapping what it cannot find on Facebook, Instagram, Threats, or WhatsApp.

Most people share photos, blog like post, copy and paste news stories, upload their own art work, etc... Onto Meta. With billions of users, Meta does not need to go far outside itself. Code seems to be their primary focus (GitHub, GitLab, etc.), since you don't find that on Meta normally.

@Linux @nixCraft That's... so scummy. I know I shouldn't be surprised, but.

@nixCraft I added Anubis to my website hofstede.io yesterday. Was simple and works like a charm. Just deployed it as a container and included it into my traefik configuration 🙂

According to my logs, the crawler traffic did decline significantly! 😊