Almost 29 million requests from AI crawlers defeated by essentially one simple check: if the user agent contains Chrome/ or Firefox/, and doesn't have sec-fetch-mode, it's going into the maze.

Billions of dollars poured into AI, yet, their crawlers are broken by two ifs in an nginx config.

If this all wasn't so sad, I'd laugh.

Post by iocaine powder, @[email protected]

#iocaine has been up for 4days 15h 5m 28s, and spent 1day 6h 26m 31s dealing with - *gestures hands wildly* - [everything](https://monitor.madhouse-project.o…

come-from.mad-scientist.club
@algernon what's sec-fetch-mode? also, what's the risk of this affecting humans? chrome and firefox are valid browsers after all, although I dk if those are valid user agents

@esoteric_programmer sec-fetch-mode is another HTTP header that Chrome & Firefox send, whenever they're requesting something over HTTPS.

If the header is not present while the user-agent suggests it's Chrome or Firefox, the likelyhood of it being a bot is extremely high.

The only exception I know of is the scenario in which someone puts a page into Reader Mode under Firefox and reloads it while in reader mode - that ends up with Firefox not sending the sec-fetch-mode header for some odd reason. Restoring a saved session with tabs in Reader Mode suffers from the same problem, that restore is essentially a reload.

This... doesn't happen often. I know of one case where it caused problems, and we quickly found a workaround: leave reader mode, reload, get back into reader mode. I've been keeping an eye on my logs since, and in the past month or so, I couldn't find any case where the browser was Firefox, and without a sec-fetch-mode header, and wasn't a bot (I have other indicators that let me decide this, but those require my particular setup).

In short: the risk of this affecting humans is not zero, but very tiny, and there's a workaround. One can serve them a page in that case describing the workaround.

@algernon what about rss readers and the like? those use that kind of user agent too, right?

@esoteric_programmer No, they do not, unless they're browser extensions, in which case the browser will take care of the header.

Some use Mozilla/5.0 in their user agent, but they usually do not have Firefox/<version> or Chrome/<version> in the user agent, unless they are running within said browser.

@algernon ahh, mozilla/5.0 compatible, etc etc, that's what I was thinking when you said firefox, gotcha
@esoteric_programmer Ah! Yeah, no, not Mozilla/5.0, that'd be too broad. Explicitly Firefox/ or Chrome/ in the user agent.
@algernon hmm, gnome-podcasts uses tor's user agent, that would trigger this, right?

@esoteric_programmer I just checked - yeah, gnome-podcasts would be a false positive here.

Thanks for highlighting that, I'll do some more digging!