WAF: wrong approach firewall - why the common negative security model is wrong, why the positive model is superior and how you can implement it with #vinylcache. talk at #gpn24

https://media.ccc.de/v/gpn24-385-waf-wrong-approach-firewall

#vinylcache #gpn24 #waf #webapplicationfirewall

WAF: Wrong Approach Firewall

media.ccc.de

correction: someone noticed a mistake in the path regex quiz:

ยซit can't be /[^/]+/ (or /[^/?]+/ for URLs) because that would also match the "hidden" directory /.foo/, which the glob would not match.
so we end up with /[^/.][^/]*/ (or /[^/?.][^/?]*/).ยป

I think they are absolutely correct. I was focused on making the point about ? and overlooked hidden file behaviour. Thank you anonymous reporter for this correction!

@slink thank for your talk. Takeaways for me:

- signed cookies
- stripping unneeded headers from the backend requests

@slink
I watched the presentation and would rate it as worth watching.

Thanks again for your presentation.

My takeaway:

The firewall I was developing in my youth was, as any similar approach today, futile.

I should have just closed all ports and opened the ones I really needed.

@slink witzig dich nach so langer Zeit durch Zufall bei so einem Vortag zu sehen. Durch dich habe ich damals das Internet โ€žentdecktโ€œ. :-)

@slink thank you so much for this talk! super interesting to see a completely different approach to this subject.

Do you think some sort of .well-known or open standard for applications to expose valid Path/Method/Headers to a fronting WAF would work or I figure if you're going to modify an application to provide this you might as well implement HTTP Signatures (I'd vaguely heard about these but didn't know you could use them this way).

Maybe web frameworks (example django because i'm familiar with it) could provide an export that could be uploaded to a CDN/WAF to create a base ruleset? Do you have any thoughts on a format specification?

Thanks for your work on Vinyl Cache, cheers!

@slink oh, I'd also like to know your opinion/thoughts on the recent Bot Scraper WAF's like Anubis https://anubis.techaro.lol/ or Iocaine https://iocaine.madhouse-project.org/ (I know these probably fall under "bad/negative" traffic WAF blocking but I think some of their methods have some merit. Thanks again!
Anubis: Web AI Firewall Utility | Anubis

Weigh the soul of incoming HTTP requests to protect your website!

@theraspb here's part one of the answer: https://fosstodon.org/@vinyl_cache/116738762630361715

You sent me down a rabbit hole (no criticism), but I wanted to make this helpful.

@slink I did not expect such a response! (would have been a bit hard on stage eh?)

Honestly this has got me more excited to run Vinyl Cache, I think people that are hosting their own stuff for community benefit face these scraping issues too much which is a burden and ads even more pressure for people to use less FOSS alternatives or give up control.

I've been thinking that we already have the tools to face these issues but not packaged up in a way that makes it easy for someone to make use of them, articles like this help people implement this stuff easier and i appreciate the "Why" explained in the article too.

Thank you!

@theraspb So, regarding Anubis: Ideally, I would like to write an article similar to the one on Iocaine including the "here's how to do the same in Vinyl Cache", but in this case this involves developing some JS to be run in the browser, and this triggers a defense reflex, because I really don't like JS. So I am not sure if I will get around to it, and will try to give a comparably shorter response:

The central problem of all "crawler defense" techniques is to identify either ...

@theraspb ... legitimate users (to allow access) or crawlers (to block or otherwise "tame"). In the Iocaine article I have tried to explain why identifying crawlers is impossible in the general case, and more "the crawler is too dumb to properly disguise" where detection works. Sure enough, if a crawler _wants_ to be identified as such, there are good ways (DNS, IP lists, signed requests), but it's generally not hard for a crawler to pretend it was a legitimate anonymous surfer wrt http headers.
@theraspb So back to the first question: Can we maybe identify legitimate users better than crawlers? Ultimately, we can not peek out the user's screen and see if there's a person there, but google and apple are trying things in this direction with their attestation stuff, which essentially boils down to trusting secrets burnt into hardware, which in turn means users no longer have control over their devices. Scanning ID cards, faces and similar things also are similar attempts...
@theraspb which also relate to the "age verification" mess. IMHO, this all ultimately leads to a central internet with walls everywhere.
If we want to avoid that route and still identify legitimate users, one option is to require a log in, then the question is how hard it is to register an account and if account/cookie sharing is limited. But this also excludes anonymous users.
Some intermediate approaches try to identify if the connecting user agent "is a browser". A very simple form is to ...

@theraspb simply check if the user-agent supports cookies or supports cache validation. As many crawlers do, people resorted to checking if Javascript works, and crawlers adapted.

Anubis implements this idea combined with a proof of work: The client is tasked to find a hash collision by running javascript code, which induces relevant cost in terms of CPU time. If all goes well, crawlers will not invest that cost and stay out, but, IMHO, clearly thjs model is not sustainable:

@theraspb It requires users to enable Javascript and burns CPU time (=energy), which is, I think, the wrong signal: The web should work without JS.
Also, the cost of proof of work goes down considerably if you use specialised hardware (GPUs, ASICs, see also bitcoin miners), so determined crawlers will pay less than well meaning users.
My personal opinion is that the proof of work approach is still not good enough, and I have some ideas which I want to talk about laterโ€ฆ

@theraspb To summarise, Anubis is an HTTP proxy to issue and validate proof of work tasks. This function can also be implemented in Vinyl Cache, but someone would need to do it.

Also, there could be better options to achieve a similar goal. Stay tuned.

@slink again thanks for your thoughts and opinions. I'll be staying tuned for sure.
@slink this is why I prefer the sort of tarpit (iocaine) or the spamtrap method (taken from SMTP) of identifying scrapers that don't actually care about the operator at all. obviously centralizing the web is not the goal of ours, reducing abusive traffic is.