Mastodawn

My wife’s website keeps going down this evening. I’m not sure why (and typically am not in the easiest location to debug). Lots of connections in the logs from a user agent of “meta-external agent/1.1 (+https://developers.facebook.com/docs/sharing/webmasters/crawler)” though… 🤔

Meta Web Crawlers - Sharing - Documentation - Meta for Developers

This page lists the User Agent (UA) strings that identify Meta’s most common web crawlers and what each of those crawlers are used for.

Meta for Developers

Show thread

Jonathan B ✈️🪄👨🏻‍💻5d ago

All of them are going to /shop/basket/?remove_item=…&_wpnonce=…&add-to-cart=…

Have turned on strict bot mode in WP Defender for now and they’re getting 403 responses.

Site hasn’t gone down since… 🤔

Show thread

Jonathan B ✈️🪄👨🏻‍💻

The meta docs page linked says they honour robots.txt which would appear to be rubbish as the one generated by Wordpress contains a couple of lines which should include the requests they’re making I think:

Disallow: /*?add-to-cart=
Disallow: /*?*add-to-cart=

I might just grab the (long) list of source IPs that they show how to grab from Whois and block the lot with Caddy.

#wordpress #bots #robotstxt

Show thread

Ben C 5d ago

@jmb better off firewalling if you can, then you don’t have to handle a TLS negotiation.

Show thread

Jonathan B ✈️🪄👨🏻‍💻5d ago

@bencc aha, good point! I will investigate that on-server vs through @hetzner’s cloud dashboard stuff.

Show thread

Jonathan B ✈️🪄👨🏻‍💻5d ago

Blimey, there are 1059 IPs that come back from their query!

Show thread

Ben C 5d ago

@jmb I am not surprised, it’s what we’re seeing at work. It’s the ones pretending to be real user agents that are really annoying, it’s common to see hundreds of thousands of IPs from scammy residential proxies and they only make one request each, so firewalling those is useless.

Show thread

Leon Cowle 5d ago

@jmb 2 things.
1. It won't suprise me if Meta is ignoring their own published robots.txt guidance.
2. Considering that this is part of my day job, I've seen COUNTLESS bad actors abuse the crawler UAs from Meta/ChatGPT/etc. So by all means try to block Meta's IPs, but it won't surprise me at all if these connections you're seeing aren't from Meta afterall.