Web design in the early 2000s: Every 100ms of latency on page load costs visitors.

Web design in the late 2020s: Let's add a 10-second delay while Cloudflare checks that you are capable of ticking a checkbox in front of every page load.

@david_chisnall I like this one specifically because the Cloudflare gate is there to address the problem of "Too many visitors."
@mark @david_chisnall Instead of fixing broken code with proper logging and code performance observability, lets stop all the effort and expect Cloudflare to care about actual humans (and not just about their PaaS billing). 😓
@autiomaa @mark @david_chisnall Honestly I'm kind or surprised there isn't a "pay Cloudflare for X connections without a challenge/captcha", because it would be another revenue stream for them.

@internic There is such payment model on Cloudflare for the LLM companies (giving them much faster download speeds for 3rd party content scraping), but not for regular consumers.

@mark @david_chisnall

@autiomaa So the bots have an option to bypass the captchas meant to catch bots but the humans don't. That tracks. 😩 @mark @david_chisnall

@internic That's not a bug, that's a feature!
I guess...

@autiomaa @mark @david_chisnall

@mark @david_chisnall I don't think that's actually the case, at least not entirely. The main issue is that the Internet is currently being inundated with LLM content crawlers to the point that it overwhelms websites or scrapes content some sites don't want sucked into AI training data. It has caused a massive number of sites to serve those bot-detection pages to everyone. So it's not quite an issue of too many visitors but actually "too many non-human visitors"
@danherbert @david_chisnall I wasn't limiting "visitors" to humans.
@danherbert @mark @david_chisnall Sadly, that is our reality. One siteʼs traffic was 75–80 per cent scraper (even back in 2023) so up went the Cloudflare blocks and challenges. (Before anyone @s me about this, Iʼm not a computer whiz so this is the only thing I know how to use.) And itʼs finally worked after figuring out which ASNs and IP addresses are the worst, with traffic on that site back to pre-2023 levels (which I know means an overall drop in ranking).

@mark

This morning, Cloudflare decided that a company I wanted to place an order with shouldn't trust me, so I went to one of their competitors.

@david_chisnall There is a hilarious possible future where the government fails to do anything about monopolies but Cloudflare has a de-factor competition increase effect because it makes it so onerous for everyone to use one site that people start self-selecting to use other sites.
@mark @david_chisnall Monopolies like Amazon don't use Cloudflare. It's the small guys with incompetent webdev teams that use CF.

@david_chisnall "Please wait while we check that your Browser is safe" while my laptop goes for a minute or two into full load and screaming hot

Perhaps ending in "We are sorry but we could not verify you are an actual human, your machine shows suspect behaviour, sent an e-mail to admin to get access"

@Laberpferd @david_chisnall proof of work is such a bad CAPTCHA. Like, who thought bots couldn't evaluate JS

@vendelan
The idea is not that they can't, it's that they won't.
If you're a human visiting a website, evaluating some JS at worst costs you a few seconds. If you're a scraper bot trying to get millions of sites a second, it slows you down.

@Laberpferd @david_chisnall

@david_chisnall True! Well you could at least call someone at O‘reilly and suggest writing a book on that topic 😅
@david_chisnall
It's also the tens of MByte of Frameworks and JavaScript and ad services that have to be loaded every single time.
@david_chisnall I'd like to automate the process of responding to Cloudflare's checks

@jackeric that's exactly what their code is designed to prevent
It's still possible, but... not without some fighting

@david_chisnall

@david_chisnall why is that there? Bots and AI scraping. None of this would be necessary otherwise.

@david_chisnall

On top of all the broken links we’ll send if your not using the proper browser.

@david_chisnall This was when the tech bros realized that it is all in comparison to everything else.

If you just make EVERYTHING worse then it doesn't matter that you're bad.

The real story of computing (and perhaps all consumer goods)

@david_chisnall it's funny, everytime I try to access a website that uses Cloudflare, I have to use sth else or disable my VPN && my DNS resolver.
So if they can have my data, they let me use them. So don't tell me it is about prorection against bots.
It's about gathering data - or am I just paranoid af?

@hex0x93 I know nothing about Cloudflare's data practices. But I do know a lot of sites have been forced to go with Cloudflare because so many AI bots are incessantly scraping their site that the site goes down and humans can't access it - essentially AI is doing a DDOS, and when that's sustained for weeks/months/more then the Cloudflare-type system seems to be the only way to have the site actually available to humans.

I hate it but those f---ing AI bots, seriously, they are ruining the net.

@david_chisnall

@david_chisnall @zeborah i know, and it probably isn't about data and stuff. But for me it is annoying, that it deems me as a bot, just because of some settings I enabled on my browser and system....^^
@zeborah @hex0x93 @david_chisnall This pretty much describes us. Scrapers as well as brute-force hackers multiple times per hour (even literally per second). One siteʼs traffic was 75–80 per cent scraper.
@jackyan @zeborah @david_chisnall and it is totally understandable to protect yourself against that. It is just super annoying for ppl like me, who value and protect their privacy.
An I am no webscraper, nor am I a hacker....
@hex0x93 @zeborah @david_chisnall I hear you as I get annoyed, too. I believe ours is the one with the tick box, so no stupid 'Choose the bicycles' or rejection because you use a VPN.
@jackyan @zeborah @david_chisnall I love that!❤️❤️

@hex0x93 I try to use the "Managed Challenge" on CF which tests the browser and often "solves itself" within a second or so (wiggling the mouse might help with that, I'm not sure). The checkbox only appears when that fails. I try to not block anything except for the worst, known offenders. Reddit, Yelp & others are blocking me entire when I use my ad-blocking VPN on the phone — just stupid...

@jackyan @zeborah @david_chisnall

@alexskunz @jackyan @zeborah @david_chisnall that's cool, and those do work sometimes. What you say about reddit and stuff not working is my everyday, online life. I chose it, still annoying, but I guess it is like in life...the few bad people ruin it for everyone😜😜
Sometimes I think I am just paranoid...can't help it😅

@zeborah @hex0x93 @david_chisnall Partially correct – their reason is fair, but CloudFlare is just one of the providers offering such protection, an oligopolist, not a very good service, and probably a data hoarder.

Anubis would be a popular alternative, however self-hosted.

(I can‘t access CF-gated sites at all because the checkbox captcha just breaks most of the time. Happens if you activate basic privacy features of your browser.)

@zeborah @hex0x93 @david_chisnall Indeed, I once had a link directory with thousands of pages. It is now down due to the large number of bots that visits the site.
@david_chisnall I don't even care about Cloudflare (and Anubis) checks – those at least rarely last more than a few seconds. What I loathe are the throbbing placeholders that seem to be everywhere now, causing simple text pages to load slower than similarly-looking pages (once the content renders) loaded on dial-up.

@jernej__s @david_chisnall Don't get me started on Anubis.

I was browsing with SeaMonkey and wanted to find out if it was possible to customise/edit a search engine in SeaMonkey. So I followed a link to the mozillazine forums.

Using SeaMonkey I could NOT get past the Anubis check, it just hung and never completed.

Maybe these systems could also check the browser strings and be clever enough to realise that a SeaMonkey user might have a genuine reason to visit the mozillazine website?

@the_wub They check user-agent and challenge anything that claims to be Mozilla (because that's what the majority of bots masquerade as).

Also, weird that Seamonkey can't pass it – I just tried with Servo, and it had no problems.

@jernej__s @the_wub Every graphical web browser claims to be Mozilla.

@jernej__s 1/n SeaMonkey is still based on an ancient version of the Firefox codebase.

I love the email client, and the browser has the tabs in the right place, but the browser fails to work on features in modern web sites. Ones that do not fall back gracefully.

I presume that this is what causes Abubis challenges to fail when using SeaMonkey.

I can get into Mozillazine without any Anubis challenge appearing using Netsurf. Which has a limited implementation of javascript.

@jernej__s 2/n So I installed NoScript in SeaMonkey to see if it is a javascript issue.

With javascript turned off I get this message.

"Sadly, you must enable JavaScript to get past this challenge. This is required because AI companies have changed the social contract around how website hosting works. A no-JS solution is a work-in-progress."

So being blocked now from using an add-on to protect myself from malicious scripts on websites.

OK so I will now whitelist Mozillazine.

@jernej__s 3/n
Aha! A message I did not get the last time I got stuck trying to get into Mozillazine using SeaMonkey.

"Your browser is configured to disable cookies. Anubis requires cookies for the legitimate interest of making sure you are a valid client. Please enable cookies for this domain."

(But SM is set to accept all cookies.)

So in order for websites to protect themselves from AI scraping users have to reduce the level of security they are prepared to accept as safe when browsing.

@jernej__s n/n
Or to go through processes of whitelisting all of the relevant sites that you wish to visit as safe so that Anubis can validate your browser whilst otherwise disabling cookies and javascript for other sites.

Or just go find other sites to visit that do not assume you are a bot and block you from viewing content.

As regards SM being set to accept cookies and Anubis not recognising that maybe my pihole is blocking something that Anubis expects to find in a valid client?

@the_wub Sadly, when you get 5000 requests per second from residential IPs (each IP doing 10-20 requests, all using user agents from legitimate browsers), there's very little other things you can do. This is not an exaggeration, that's what was happening at a client that has a web server hosting about 50 sites for their projects – they were getting hit with that several times per week, bringing the whole server down until we implemented Anubis.

@jernej__s I understand the battle.

The protective measures taken though should not make things more dangerous for users.

Unless, of course the internet is nearing the end of the path as a free and open source of information.

In which case, what does it matter.

@jernej__s OK. Here is the rub.

I get challenged twice trying to get into the Mozillazine forums.

1) now it lets me pass but I have to make sure the link from my search engine is the https on NOT the http link.

2) I get to the Mozilla landing page with a list of links to the forums.

https://mozillazine.org/

These links are all http.

I cannot get past the Anubis challenge unless I alter the link to the https version.

Now finally logged into the forums with SeaMonkey.

#anubis #mozillazine

mozillaZine

@jernej__s As the OP began this thread saying how we have to wait for tens of seconds for Cloudflare like challenges and I have spent a lot more than this amount of time this morning sorting one problem with Anubis for one site for one browser.
@david_chisnall BRB, going to use my landline phone to call my local lunch spot to place an order that I will walk to go get.

@david_chisnall Just had a bunch of these whilst trying to do a reverse lookup on a number used to call me this evening.

I think that the peak internet speed was in the early 1990s. Dial up was slow but pages were static html with no javascript/font/whatever else calls to other sites hosting the resources.

Each search on AltaVista would produce a first page full of genuinely useful websites one of which would be guaranteed to answer your question.

This is NOT just nostalgia.

@david_chisnall Cloudflare is a protection racket. It's disgusting.
@david_chisnall late 2020s and LLM brought us choice. If Cloudflare isn’t of your liking Anubis is happy to add some delay to page load
@david_chisnall Also along those lines, I miss the days when page contents included the most important information instead of being loaded later via JavaScript.

@david_chisnall
So what will you do?

Nobody gets fired for buying cloudflare.

@david_chisnall There's the self-hosting option of sticking anubis in front of your service so that it can throttle visitors by making their browser do a bunch of work.

There's also the bouncing around between various services and proxies in order to get logged in...something I'm currently struggling to figure out because apparently I'm a dumbass that can't figure out traefik or how to properly set environment variables or something.

@david_chisnall I remember optimizing thumbnail-images to within kilobytes of their lives...

...and now apparently nobody thinks twice about requiring many MB of JS code per page-load.

(TLDR: this current nonsense is nonsense.)

@woozle I'll just be happy if people stop serving images that should be jpegs or webp in png format.

@david_chisnall

@sysop408 @woozle @david_chisnall nothing’s wrong with webp tho by all means use webp and avif